COMMON MISTEAKS
MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Unusual Events
If the research question being studied involves unusual events,
the mean or median is
not adequate as a summary statistic.
Examples
1. If you are deciding what capacity air conditioner you
need, the average yearly (or even average summer) temperature will not
give you guidance in choosing an air conditioner that will keep your
house cool on the hottest days. For this purpose, it would be much more
helpful to know the highest temperature you might encounter, or how
many days you can expect to be above a certain temperature.
2. Traffic safety interventions typically are aimed at high speed
situations. So the average speed is not as useful as, say, the 85th
percentile of speed.
3. Pregnancy interventions are often aimed at reducing the incidence of
low birth weight babies. Neither the mean nor the median birth weight
in a population gives you information about low birth weight babies, so
neither the mean nor the median is a suitable summary statistic in this
situation. However, percentage of births in the low weight category
might be a suitable summary statistic. ("Low weight" might be defined
as weight in a range know to be associated with greater risk of health
problems, or it might be defined as weight below a certain percentile
of a reference population of newborns.)
4. If two medications for lowering blood pressure have been compared in
a well-designed, carefully carried out randomized clinical trial, and
the average drop in blood pressure for Drug A is more than that for
Drug B, we cannot conclude just from this information alone that Drug A
is better than Drug B. We also need to consider the incidence of
undesirable side effects. One might be that for some patients, Drug A
lowers blood pressure to dangerously low levels. Or it might be the
case that for some patients, Drug A actually increases blood pressure.
Thus in this situation, we need to consider extreme events in both
directions.
Unusual events such as earthquakes and extreme behavior in the stock
market can have large effects, so are important to consider. They have
come to be called "Black Swan Events," a term coined in the 2007 book The Black Swan1, by risk
analyst Nassim Nicholas Taleb.
Many techniques have been developed for studying unusual events;
however, these techniques are not usually mentioned in introductory
courses in statistics. And, like other statistical techniques, they are
not "one-size-fits-all." Some references are given below.2,3
Notes:
1. Taleb, N. N. (2007) The Black Swan: The Impact of the Highly
Improbable, Random House. See also the "Special Section: Reviews
of The Black Swan" in
The American Statistician, Vol.
61, No. 3, August 2007, pp. 189 - 200, and the review
by David Aldous. Taleb's earler book, Fooled by Randomness: The Hidden Role of
Chance in Life and in the Markets (Random House, 2001)
may also be of interest.
2. For extreme events, references include:
- Castillo, E. (1988) Extreme
value theory in engineering. Academic Press, Inc. New York
- Coles, Stuart (2001) An Introduction to Statistical Modeling
of Extreme Values, Springer
- Embrechts, P., C. Klüppelberg, and T. Mikosch (1997) Modelling extremal events for insurance
and finance. Berlin: Spring Verlag
- D.A. Freedman and P.B.
Stark. “What
is the chance of an earthquake?” In
Earthquake Science and Seismic Risk Reduction. NATO Science Series IV:
Earth and Environmental Sciences, vol. 32, Kluwer, Dordrecht, The
Netherlands (2003) pp. 201–213. F. Mulargia and R. J. Geller,
eds
- Gumbel, E..J. (1958), Statistics
of Extremes, Columbia University Press
- Mandelbrot, B 1963. The
variation of certain speculative prices, The Journal of Business of the University
of Chicago 36, 394-419
- Resnick, S. (1987) Extreme
Values, Point Processes, and Regular Variation, Springer.
- Smith, R. L. (2000), Measuring risk
with extreme value theory. In Risk
Management: Theory and Practice, edited by M. Dempster,
Cambridge University Press. Also published as chapter 2 of Extremes and Integrated Risk Management,
edited by P. Embrechts. Risk Books, London, 19-35
3. For questions involving quantiles (percentiles), quantile
regression may be useful.
Last updated June 19, 2015