Pseudoreplication

The term pseudoreplication was coined by Hurlbert to refer to "the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent."

Here, replication

Heffner et al

Most models for statistical inference require true replication. True replication permits the estimation of variability within a treatment. Without estimating variability within treatments, it is impossible to do statistical inference. Consider, for example, comparing two drugs by trying drug A on person 1 and drug B on person 2. Drugs typically have different effects in different people. So this simple experiment will give us no information about generalizing to people other than the two involved. But if we try each drug on several people, then we can obtain some information about the variability of each drug, and use statistical inference to gain some information on whether or not one drug might be more effective than the other on average.

True replicates are often confused with repeated measures or with pseudoreplicates. The following illustrate some of the ways this can occur.

Examples:

1. Suppose a blood-pressure lowering drug is administered to a patient, then the patient's blood pressure is measured twice. This is a repeated measure, not a replication. It can give information about the uncertainty in the measurement process, but not about the variability in the effect of the drug. On the other hand, if the drug were administered to two patients, and each patient's blood pressure was measured once, we can say the treatment has been replicated, and the replication may give some information about the variability in the effect of the drug.

2. A researcher is studying the effect on plant growth of different concentrations of CO

3. Two fifth-grade math curricula are being studied. Two schools have agreed to participate in the study. One is randomly chosen to use curriculum A, the other to use curriculum B. At the end of the school year, the fifth-grade students in each school are tested and the results are used to do a statistical analysis comparing the two curricula. There is no true replication in this study; the students are pseudo-replicates. The schools are the experimental units; they, not the students, are randomly assigned to treatment. Within each school, the test results (and the learning) of the students in the experiment are not independent; they are influenced by the teacher and other school-specific factors (e.g., previous teachers and learning, socioeconomic background of the school, etc.).

Consequences of doing statistical inference using pseudoreplicates rather than true replicates

Variability will probably be underestimated. This will result in

- Confidence intervals that are too small.
- An inflated probability of a Type I error (falsely rejecting a true null hypothesis).

What to do about pseudoreplication

1. Avoid it if at all possible.

Key in doing this is to carefully determine what the experimental/observational units are; then be sure that each treatment is randomly applied to more than one experimental/observational unit. For example, in comparing curricula (Example 3 above), if ten schools participated in the experiment and five were randomly assigned to each treatment (i.e., curriculum), then each treatment would have five replications; this would give some information about the variability of the effect of the different curricula.

2. If it is not possible to avoid pseudoreplication, then:

a. Do whatever is possible to minimize lack of independence in the the pseudo-replicates. For example, in the study of effect of CO

b. Be careful in analyzing and reporting results. Be open about the limitations of the study; be careful not to over-interpret results. For example, in Example 2, the researcher could calculate what might be called "pseudo-confidence intervals" that would not be "true" confidence intervals, but which could be interpreted as giving a lower bound on the margin of error in the estimate of the quantity being estimated.

c. Consider the study as preliminary (for example, for giving insight into how to plan a better study), or as one study that needs to be combined with many others to give more informative results.

Comments

- Note that in Example 2, there is no way to distinguish between effect of treatment and effect of growth chamber; thus the two factors (treatment and growth chamber) are confounded. Similarly, in Example 3, treatment and school are confounded.
- Example 3 may also be seen as applying the two treatments to two different populations (students in one school and students in the other school)
- Observational studies are particularly prone to pseudoreplication.
- Regression can sometimes account for lack of replication, provided data are close enough to each other. The rough idea is that the responses for nearby values of the explanatory variables can give some estimate of the variability. However, having replicates is better.

Notes:

1. S. H. Hurlbert (1984) Pseudoreplication and the design of ecological field experiments, Ecological monographs 54(2) pp. 187 - 211. (The quote is from the abstract).

2. There are other uses of the word replication -- for example, repeating an entire experiment is also called replication; each repetition of the experiment is called a replicate. This meaning is related to the one given above: If each treatment in an experiment has the same number r of replicates (in the sense given above), then the experiment can be considered as r replicates (in the second sense) of an experiment where each treatment is applied to only one experimental unit.

3. Heffner, Butler, and Reilly (1996) Pseudoreplication Revisited, Ecology 77(8) 1996 pp. 2558 - 2562 (quote from p. 2558)