The Laboratorium : And the Results Are In

Thank you all for your answers to my question about tanker capacity. I received the following fourteen guesses:

10,000
100,000
500,000
1,000,000
3,170,064
10,000,000
20,000,000
25,000,000
40,000,000
50,000,000
101,520,224
150,000,000
200,000,000
200,000,000

The correct answer is, to two significant figures, 170,000,000, which is rather towards the higher end of the guesses. Both the median (22,500,000) and arithmetic mean (57,235,734) are substantially smaller than the true value.

And now for the other shoe. In Sunstein’s original example, he surveyed his colleagues at the University of Chicago law school about the weight in pounds of the space shuttle’s fuel. His point was that when the members of a group know nothing at all about a topic, there is no particular reason that statistics computed from their guesses will be any better than the guess of a random group member. (Put another way, for a crowd to be wise, its average member must know something rather than nothing.) I thought that this conclusion was a little too strong; people might know extremely little about a question, but there might be more appropriate statistics about their guesses that would extract and combine what little they do know.

In Sunstein’s experiment, the true value was about 4 million pounds, but the median was a paltry 200,000. On the other hand, the mean was outrageously high, at 55 million, thanks to one absurdly large guess. Looking at his report of the experiment, it struck me that the geometric mean might have done better than the median or the mean. My theory, such as it was, was that guesses about unknown huge numbers are little more than guesses about order of magnitude, so that the real statistic we ought to be averaging is is the number of zeroes at the end of people’s answers.

Thus, from this little experiment, two things are apparent. First, my hypothesis is in bad shape. Your geometric mean—8,462,166.40—is worse than your arithmetic mean or your median. In hindsight, I should have expected this. Sunstein had a sample in which most people’s guesses were substantially too small, I would assume because people get uncomfortable with really big numbers and because people tend to underestimate volumes, even when their intuition about linear dimensions is good. Thus, the median was also too small. His mean, on the other hand, was dominated by one almost implausibly large guess.

It was something of a coincidence, then, that the geometric mean (which is smaller than the arithmetic mean) was probably closeish to correct. His outlier just happened to be the right size for it to work. With a more restrained set of guesses and no big outliers, the arithmetic mean is less wrong than the geometric. I would be interested to repeat this experiment in other contexts to see how often one gets outliers of that sort and by how much they tend to be off. But I now no longer expect the geometric mean to have any sort of consistent systematic advantage for problems of this sort.

The second thing I learned from this experiment is that (for a sample size of one experiment) my readers are on average more knowledgeable than the Chicago law faculty.