COMMON MISTEAKS
MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Detrimental Effects of
Underpowered or Overpowered Studies
The most straightforward
consequence of underpowered
studies (i.e., those with low probability of detecting an effect of
practical importance) is
that effects of practical importance are not detected.
But there is a second, more subtle
consequence: underpowered
studies
result in a larger variance of the estimates of the parameter being
estimated. For example, in estimating a population mean, the sample
means of studies with low power have high variance; in other words,
the sampling
distribution of
sample means is wide. This is
illustrated in the following picture, which shows the sampling
distributions for a variable with zero mean when sample size n = 25
(red) and when n = 100 (blue). The vertical lines toward the right of
each sampling distribution show the cut-off for a one-sided hypothesis
test with null hypothesis µ = 0 and significance level alpha =
.05. Notice that
- The sampling distribution for the smaller sample size (n = 25) is wider than the sampling
distribution for the larger sample size ( n = 100).
- Thus, when
the null hypothesis is rejected with
the smaller sample size n = 25, the sample mean tends to be noticeably
larger than when the null hypothesis is rejected with the larger sample
size n = 100.
This reflects the general phenomenon that studies
with low power have a larger chance of having a large effect size
(e.g., sample mean)
than
studies with high power.1
In particular, when there is a Type I error
(falsely rejecting the null hypothesis), the effect will appear to be
stronger with a small sample size (lower power) than with a large
sample size (higher power).2 This
may
suggest an effect that is not
there. Such a mistake may go undetected because of the File
Drawer Problem. Thus, when
studies are underpowered, the literature
is likely to be inconsistent and often misleading. Here is an example that appears to show this
phenomenon in a research survey.
Overpowered
studies waste
resources. When human or animal3 subjects are
involved, having an overpowered
study can be considered unethical. More generally, an overpowered study
may be considered unethical if it wastes resources.
A common compromise between overpower and
underpower is to try for
power around .80. However, power needs to be considered case by case,
balancing the risks of Type I and Type II
errors.
Notes:
1. For more discussion, see Andrew
Gelman and David Weakliem, Of Beauty, Sex, and Power, The American Scientist, 97(4),
July-August 2009, www.stat.columbia.edu/~gelman/research/published/power4r.pdf
2. This sentence
was misstated in
the original version of this page, but the misstatement was corrected
Sept. 23, 2013. Thanks to Stefan Wiens for pointing out the error.
3. For
more on ethical considerations in study design for
research on animals, see:
- Festing, Michael, Statistics and animals in biomedical research, Significance Volume 7 Issue 4
(December 2010), available online
- Kilkenny et al, (2010) Improving bioscience research reporting:
The ARRIVE guidelines for reporting animal research. PLoS Biology, 8, online
Last
updated Sept 23, 2013