of the third round to go to the fourth round, thus doubling the number of
alternatives.) Of these combinations, only one will be up forty, and only
one will be down forty. The rest will hover around the middle, here zero.
We can already see that in this type of randomness extremes are exceedingly
rare. One in 1,099,511,627,776 is up forty out of forty tosses.
If you perform the exercise of forty flips once per hour, the odds of getting
40 ups in a row are so small that it would take quite a bit of forty-flip trials
to see it. Assuming you take a few breaks to eat, argue with your
friends and roommates, have a beer, and sleep, you can expect to wait
close to four million lifetimes to get a 40-up outcome (or a 40-down outcome)
just once. And consider the following. Assume you play one additional
round, for a total of 4 1 ; to get 41 straight heads would take eight
million lifetimes! Going from 40 to 41 halves the odds. This is a key at2
4 8 THOSE GRAY SWANS OF EXTREMISTAN
FIGURE 9: NUMBERS OF WINS TOSSED
Result of forty tosses. We see the proto-bell curve emerging.
tribute of the nonscalable framework to analyzing randomness: extreme
deviations decrease at an increasing rate. You can expect to toss 50 heads
in a row once in four billion lifetimes!
We are not yet fully in a Gaussian bell curve, but we are getting dangerously
close. This is still proto-Gaussian, but you can see the gist. (Actually,
you will never encounter a Gaussian in its purity since it is a Platonic
form—you just get closer but cannot attain it.) However, as you can see in
Figure 9, the familiar bell shape is starting to emerge.
How do we get even closer to the perfect Gaussian bell curve? By refining
the flipping process. We can either flip 40 times for $1 a flip or
4,000 times for ten cents a flip, and add up the results. Your expected risk
is about the same in both situations—and that is a trick. The equivalence
in the two sets of flips has a little nonintuitive hitch. We multiplied the
number of bets by 100, but divided the bet size by 10—don't look for a
reason now, just assume that they are "equivalent." The overall risk is
equivalent, but now we have opened up the possibility of winning or losing
400 times in a row. The odds are about one in 1 with 120 zeroes after
it, that is, one in 1,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000 times.
Continue the process for a while. We go from 40 tosses for $1 each to
4,000 tosses for 10 cents, to 400,000 tosses for 1 cent, getting close and
closer to a Gaussian. Figure 10 shows results spread between -40 and 40,
namely eighty plot points. The next one would bring that up to 8,000
points.
T H E B E L L C U R V E , T H A T G R E A T I N T E L L E C T U A L F R A U D 2 49
FIGURE 10: A MORE ABSTRACT VERSION: PLATO'S CURVE
An infinite number of tosses.
Let's keep going. We can flip 4,000 times staking a tenth of a penny.
How about 400,000 times at 1/1000 of a penny? As a Platonic form, the
pure Gaussian curve is principally what happens when he have an infinity
of tosses per round, with each bet infinitesimally small. Do not bother trying
to visualize the results, or even make sense out of them. We can no
longer talk about an "infinitesimal" bet size (since we have an infinity of
these, and we are in what mathematicians call a continuous framework).
The good news is that there is a substitute.
We have moved from a simple bet to something completely abstract.
We have moved from observations into the realm of mathematics. In
mathematics things have a purity to them.
Now, something completely abstract is not supposed to exist, so please
do not even make an attempt to understand Figure 10. Just be aware of its
use. Think of it as a thermometer: you are not supposed to understand
what the temperature means in order to talk about it. You just need to
know the correspondence between temperature and comfort (or some
other empirical consideration). Sixty degrees corresponds to pleasant
weather; ten below is not something to look forward to. You don't necessarily
care about the actual speed of the collisions among particles that
more technically explains temperature. Degrees are, in a way, a means for
your mind to translate some external phenomena into a number. Likewise,
the Gaussian bell curve is set so that 68.2 percent of the observations fall
between minus one and plus one standard deviations away from the average.
I repeat: do not even try to understand whether standard deviation is
average deviation—it is not, and a large (too large) number of people
2 5 0 THOSE GRAY SWANS OF EXTREMISTAN
using the word standard deviation do not understand this point. Standard
deviation is just a number that you scale things to, a matter of mere correspondence
if phenomena were Gaussian.
These standard deviations are often nicknamed "sigma." People also
talk about "variance" (same thing: variance is the square of the sigma, i.e.,
of the standard deviation).
Note the symmetry in the curve. You get the same results whether the
sigma is positive or negative. The odds of falling below -4 sigmas are the
same as those of exceeding 4 sigmas, here 1 in 32,000 times.
As the reader can see, the main point of the Gaussian bell curve is, as I
have been saying, that most observations hover around the mediocre, the
mean, while the odds of a deviation decline faster and faster (exponentially)
as you move away from the mean. If you need to retain one single
piece of information, just remember this dramatic speed of decrease in the
odds as you move away from the average. Outliers are increasingly unlikely.
You can safely ignore them.
This property also generates the supreme law of Mediocristan: given
the paucity of large deviations, their contribution to the total will be vanishingly
small.
In the height example earlier in this chapter, I used units of deviations
of ten centimeters, showing how the incidence declined as the height increased.
These were one sigma deviations; the height table also provides
an example of the operation of "scaling to a sigma" by using the sigma as
a unit of measurement.
Those Comforting Assumptions
Note the central assumptions we made in the coin-flip game that led to the
proto-Gaussian, or mild randomness.
First central assumption: the flips are independent of one another. The
coin has no memory. The fact that you got heads or tails on the previous
flip does not change the odds of your getting heads or tails on the next
one. You do not become a "better" coin flipper over time. If you introduce
memory, or skills in flipping, the entire Gaussian business becomes shaky.
Recall our discussions in Chapter 14 on preferential attachment and
cumulative advantage. Both theories assert that winning today makes you
more likely to win in the future. Therefore, probabilities are dependent on
history, and the first central assumption leading to the Gaussian bell curve
T H E B E L L C U R V E , T H A T G R E A T I N T E L L E C T U A L F R A U D 251
fails in reality. In games, of course, past winnings are not supposed to
translate into an increased probability of future gains—but not so in real
life, which is why I worry about teaching probability from games. But
when winning leads to more winning, you are far more likely to see forty
wins in a row than with a proto-Gaussian.
Second central assumption: no "wild" jump. The step size in the building
block of the basic random walk is always known, namely one step.
There is no uncertainty as to the size of the step. We did not encounter situations
in which the move varied wildly.
Remember that if either of these two central assumptions is not met,
your moves (or coin tosses) will not cumulatively lead to the bell curve.
Depending on what happens, they can lead to the wild Mandelbrotianstyle
scale-invariant randomness.
"The Ubiquity of the Gaussian"
One of the problems I face in life is that whenever I tell people that the
Gaussian bell curve is not ubiquitous in real life, only in the minds of statisticians,
they require me to "prove it"—which is easy to do, as we will
see in the next two chapters, yet nobody has managed to prove the opposite.
Whenever I suggest a process that is not Gaussian, I am asked to justify
my suggestion and to, beyond the phenomena, "give them the theory
behind it." We saw in Chapter 14 the rich-get-richer models that were
proposed in order to justify not using a Gaussian. Modelers were forced
to spend their time writing theories on possible Jnodels that generate the
scalable—as if they needed to be apologetic about it. Theory shmeory! I
have an epistemological problem with that, with the need to justify the
world's failure to resemble an idealized model that someone blind to reality
has managed to promote.
My technique, instead of studying the possible models generating
non-bell curve randomness, hence making the same errors of blind theorizing,
is to do the opposite: to know the bell curve as intimately as I can
and identify where it can and cannot hold. I know where Mediocristan is.
To me it is frequently (nay, almost always) the users of the bell curve who
do not understand it well, and have to justify it, and not the opposite.
This ubiquity of the Gaussian is not a property of the world, but a
problem in our minds, stemming from the way we look at it.
2 5 2 THOSE GRAY SWANS OF EXTREMISTAN
The next chapter will address the scale invariance of nature and address
the properties of the fractal. The chapter after that will probe the misuse
of the Gaussian in socioeconomic life and "the need to produce theories."
I sometimes get a little emotional because I've spent a large part of my
life thinking about this problem. Since I started thinking about it, and conducting
a variety of thought experiments as I have above, I have not for
the life of me been able to find anyone around me in the business and statistical
world who was intellectually consistent in that he both accepted
the Black Swan and rejected the Gaussian and Gaussian tools. Many people
accepted my Black Swan idea but could not take it to its logical conclusion,
which is that you cannot use one single measure for randomness
called standard deviation (and call it "risk"); you cannot expect a simple
answer to characterize uncertainty. To go the extra step requires courage,
commitment, an ability to connect the dots, a desire to understand randomness
fully. It also means not accepting other people's wisdom as
gospel. Then I started finding physicists who had rejected the Gaussian
tools but fell for another sin: gullibility about precise predictive models,
mostly elaborations around the preferential attachment of Chapter 14—
another form of Platonicity. I could not find anyone with depth and scientific
technique who looked at the world of randomness and understood its
nature, who looked at calculations as an aid, not a principal aim. It took
me close to a decade and a half to find that thinker, the man who made
many swans gray: Mandelbrot—the great Beno?t Mandelbrot.
THE AESTHETICS OF RANDOMNESS
Mandelbrot's library—Was Galileo blind?—Pearls to swine—Self-affinity—How
the world can be complicated in a simple way, or, perhaps, simple in a very
complicated way
THE POET OF RANDOMNESS
It was a melancholic afternoon when I smelled the old books in Beno?t
Mandelbrot's library. This was on a hot day in August 2005, and the heat
exacerbated the musty odor of the glue of old French books bringing on
powerful olfactory nostalgia. I usually succeed in repressing such nostalgic
excursions, but not when they sneak up on me as music or smell. The odor
of Mandelbrot's books was that of French literature, of my parents' library,
of the hours spent in bookstores and libraries when I was a teenager
when many books around me were (alas) in French, when I thought that
Literature was above anything and everything. (I haven't been in contact
with many French books since my teenage days.) However abstract I
wanted it to be, Literature had a physical embodiment, it had a smell, and
this was it.
The afternoon was also gloomy because Mandelbrot was moving
away, exactly when I had become entitled to call him at crazy hours just
because I had a question, such as why people didn't realize that the 80/20
2 5 4 THOSE GRAY SWANS OF EXTREMISTAN
could be 50/01. Mandelbrot had decided to move to the Boston area, not
to retire, but to work for a research center sponsored by a national laboratory.
Since he was moving to an apartment in Cambridge, and leaving
his oversize house in the Westchester suburbs of New York, he had invited
me to come take my pick of his books.
Even the titles of the books had a nostalgic ring. I filled up a box with
French titles, such as a 1949 copy of Henri Bergson's Matière et mémoire,
which it seemed Mandelbrot bought when he was a student (the smell!).
After having mentioned his name left and right throughout this book,
I will finally introduce Mandelbrot, principally as the first person with an
academic title with whom I ever spoke about randomness without feeling
defrauded. Other mathematicians of probability would throw at me theorems
with Russian names such as "Sobolev," "Kolmogorov," Wiener measure,
without which they were lost; they had a hard time getting to the
heart of the subject or exiting their little box long enough to consider its
empirical flaws. With Mandelbrot, it was different: it was as if we both
originated from the same country, meeting after years of frustrating exile,
and were finally able to speak in our mother tongue without straining. He
is the only flesh-and-bones teacher I ever had—my teachers are usually
books in my library. I had way too little respect for mathematicians dealing
with uncertainty and statistics to consider any of them my teachers—
in my mind mathematicians, trained for certainties, had no business