A study with large enough sample
size will have high enough power to detect minuscule differences that
are not of practical significance. Since power typically
increases with increasing sample size, practical
significance
is important to consider. See Type I and II
Errors and Sample
size calculations to plan an experiment (GraphPad.com) for examples.

2. Accepting a null hypothesis when a result is not statistically significant, without taking power into account.

Since smaller samples yield
smaller power, a small sample size
may not be able to detect an important difference. If there is strong
evidence that the power of a procedure will indeed detect a difference
of practical importance, then accepting the null hypothesis may be
appropriate^{1}; otherwise it is not -- all we
can legitimately say then is
that
we fail to reject the null hypothesis.

3. Being
convinced by a research study with low power.As discussed under Detrimental Effects of Underpowered Studies, underpowered studies are likely to be inconsistent and are often misleading.

4. Neglecting to do a power analysis/sample size calculation before collecting data

Without a power
analysis, you may end up with a result that does not really answer the
question of interest: you may obtain a result which is not
statistically significant, but is not able to detect a difference of
practical significance. You might also waste resources by using a
sample size that is larger than is needed to detect a relevant
difference.

5. Neglecting
to take multiple
inference into
account when calculating power.If more than one inference
procedure is used for a data set, then power calculations need to take
that into account. Doing a power calculation for just one inference
will result in an underpowered study. For more detail, see

6. Using
standardized effect sizes rather than considering the particulars of
the question being studied.- Maxwell, S. E. and K Kelley (2011), Ethics and Sample Size Planning, Chapter 6 (pp. 159 - 183) in Panter, A. T. and S. K. Sterba, Handbook of Ethics in Quantitative Methodology, Routledge
- Maxwell, S.E. (2004), The persistence of underpowered studies in psychological research: Causes, consequences, and remedies, Psychological Methods 9 (2), 147 - 163.

"Standardized effect sizes"
(sometimes
called "canned" effect sizes) are expressions involving more
than one of the factors that needs to be taken into consideration in
considering appropriate levels of Type I and Type II error in deciding
on power and sample size. Examples:

- Cohen's effect size d is the ratio of the raw effect size
(e.g.,
difference in means when comparing two groups) and the error standard
deviation. But each of these typically needs to be considered
individually in designing a study and determining power; it's not
necessarily the ratio that's important.
^{2} - The correlation (or squared correlation) in regression. The correlation in simple linear regression involves three quantities: the slope, the y standard deviation, and the x standard deviation. Each of these three typically needs to be considered individually in designing the study and determining power and sample size. In multiple regression, the situation may be even more complex.

Lenth,
Russell V. (2001) Some Practical Guidelines
for Effective Sample Size Determination, American Statistician, 55(3), 187 -
193 (Early draft available here.)

Lenth, Russell V. (2000) Two Sample-Size Practices that I Don't Recommend, comments from panel discussion at the 2000 Joint Statistical Meetings in Indianapolis.

Lenth, Russell V. (2000) Two Sample-Size Practices that I Don't Recommend, comments from panel discussion at the 2000 Joint Statistical Meetings in Indianapolis.

7. Confusing retrospective power and prospective power.

- Power as defined above for a
hypothesis test is
also called prospective or a priori power. It is a conditional
probability, P(reject H
_{0}| H_{a}), calculated without using the data to be analyzed. (In fact, it is best calculated before even gathering the data, and taken into account in the data-gathering plan.) - Retrospective power is calculated after the data have been collected, using the data.
- Depending on how retrospective power is calculated, it might be legitimate to use to estimate the power and sample size for a future study, but cannot legitimately be used as describing the power of the study from which it is calculated.
- However, some methods of calculating retrospective power calculate the power to detect the effect observed in the data -- which misses the whole point of considering practical significance. These methods typically yield simply a transformation of p-value. See Lenth, Russell V. (2000) Two Sample-Size Practices that I Don't Recommend for more detail.
- See J. M. Hoenig and D. M. Heisey (2001) "The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis," The American Statistician 55(1), 19-24 and the Stat Help Page "Retrospective (Observed) Power Analysis" for more discussion and further references.

Notes

1. In many cases, however, it would be best to use a test for equivalence. For more information, see:

- Hoenig, John M. and Heisey, Dennis M. (2001), "The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis,'' The American Statistician, 55, 19-24
- Statistical Tests for Equivalence, Graphpad.com
- Peter
A. Lachenbruch, Equivalence Testing

Last updated August 28, 2012