COMMON MISTEAKS
MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Inappropriately Designating a Factor as Fixed or Random
In Analysis of Variance and some other methodologies, there
are two types of factors: fixed
effect and random effect. Which type is
appropriate depends on the context of the problem, the questions of
interest, and how the data is gathered. Here are the differences:
Fixed effect factor: Data has
been gathered from all the levels of the factor that are of interest.
Example: The purpose of an
experiment is to compare the effects of three specific dosages of a
drug on the response. "Dosage" is the factor; the three specific
dosages in the experiment are the levels; there is no intent to say
anything about other dosages.
Random effect
factor: The factor has many possible levels, interest is in all
possible levels, but
only a random sample of levels is included in the data.1
Example: A large manufacturer of
widgets is interested in studying the effect of machine operator on the
quality final product. The researcher selects a random sample of
operators from the large number of operators at the various facilities
that manufacture the widgets. The factor is "operator." The analysis
will not estimate the effect of each of the operators in the sample,
but will instead estimate the variability attributable to the factor
"operator".
The analysis of the data is different, depending on whether the factor
is treated as fixed or as random. Consequently, inferences may be
incorrect if the factor is classified inappropriately. Mistakes in
classification are most likely to occur when there is more than
one factor in the study.
Example: Two surgical
procedures are being compared. Patients are randomized to
treatment. Five different surgical teams are used. To prevent possible
confounding of treatment and surgical team, each team is trained in
both procedures, and each team performs equal numbers of surgery of
each of the two types. Since the purpose of the experiment is to
compare the procedures, the intent is to generalize to other surgical
teams. Thus surgical team should be considered as a random factor, not
a fixed factor.
Comments:
- This example can help understand why inferences might
be different for the two classifications of the factor: Asserting that
there is a difference in the results of the two procedures regardless
of the surgical team is a stronger statement that saying that there is
a difference in the results of the two procedures just for the teams in
the experiment.
- Technically, the levels of the random factor (in this
case, the five surgical teams) used in the experiment should be a
random sample of all possible levels. This is in practice usually
impossible, so the random factor analysis is usually used if there is
reason to believe that the teams used in the experiment could
reasonably be a random sample of all surgical teams who might perform
the procedures. However, this assumption needs careful thought to avoid
possible bias. For example, the conclusion would be more sound if it
were limited to surgical teams which were trained in both procedures in
the same manner and to the same extent, and who had the same surgical
experiences, as the five teams actually studied.
Additional
Comments about Fixed and Random Factors
- The standard methods for analyzing random effects
models assume that the random factor has infinitely many levels, but
usually still work well if the total number of levels of the random
factor is at least 100 times the number of levels observed in the data.
Situations where the total number of levels of the random factor is
less than 100 times the number of levels observed in the data require
special "finite population" methods.
- An interaction term involving both a fixed and a random
factor should be considered a random factor.
- A factor that is nested in a random factor should be
considered random.
1. Usage of "random" in
this and similar contexts in not uniform. For
example, some authors, in discussing hierarchical (multilevel)
analysis, may refer to an intercept as "random" when interest is
restricted to a finite population with all members present in the data
(e.g., the various states of the U. S. A.), but the intercept is
allowed to be different for different members of the population. Using
the term "variable intercept" can help emphasize that, although the
intercept is allowed to vary, interest is only in the finite
population, with no implication of inference beyond that population.
Last updated Jan 20, 2013