Common Mistakes Involving Power

A study with large enough sample size will have high enough power to detect minuscule differences that are not of practical significance. Since power typically increases with increasing sample size, practical significance is important to consider. See Type I and II Errors and Sample size calculations to plan an experiment (GraphPad.com) for examples.

Since smaller samples yield smaller power, a small sample size may not be able to detect an important difference. If there is strong evidence that the power of a procedure will indeed detect a difference of practical importance, then accepting the null hypothesis may be appropriate¹; otherwise it is not -- all we can legitimately say then is that we fail to reject the null hypothesis.

Without a power analysis, you may end up with a result that does not really answer the question of interest: you may obtain a result which is not statistically significant, but is not able to detect a difference of practical significance. You might also waste resources by using a sample size that is larger than is needed to detect a relevant difference.

If more than one inference procedure is used for a data set, then power calculations need to take that into account. Doing a power calculation for just one inference will result in an underpowered study. For more detail, see

Maxwell, S. E. and K Kelley (2011), Ethics and Sample Size Planning, Chapter 6 (pp. 159 - 183) in Panter, A. T. and S. K. Sterba, Handbook of Ethics in Quantitative Methodology, Routledge
Maxwell, S.E. (2004), The persistence of underpowered studies in psychological research: Causes, consequences, and remedies, Psychological Methods 9 (2), 147 - 163.

For discussion of power analysis when using Efron's version of false discovery rate, see Section 5.4 of B. Efron (2010), Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambridge, or see his Stats 329 notes.

"Standardized effect sizes" (sometimes called "canned" effect sizes) are expressions involving more than one of the factors that needs to be taken into consideration in considering appropriate levels of Type I and Type II error in deciding on power and sample size. Examples:

Cohen's effect size d is the ratio of the raw effect size (e.g., difference in means when comparing two groups) and the error standard deviation. But each of these typically needs to be considered individually in designing a study and determining power; it's not necessarily the ratio that's important.²
The correlation (or squared correlation) in regression. The correlation in simple linear regression involves three quantities: the slope, the y standard deviation, and the x standard deviation. Each of these three typically needs to be considered individually in designing the study and determining power and sample size. In multiple regression, the situation may be even more complex.

For specific examples illustrating these points, see:

Lenth, Russell V. (2001) Some Practical Guidelines for Effective Sample Size Determination, American Statistician, 55(3), 187 - 193 (Early draft available here.)
Lenth, Russell V. (2000) Two Sample-Size Practices that I Don't Recommend, comments from panel discussion at the 2000 Joint Statistical Meetings in Indianapolis.

Introduction Types of Mistakes Suggestions Resources Table of Contents About Glossary

Common Mistakes involving Power