This
site is under construction. Please check back every few weeks for
updates
COMMON MISTEAKS MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Types of Studies
There are two broad ways of classifying studies involving
statistics: according to how the data are collected, and according to
the purposes of the analysis of the data. A single research paper might
include more than one type of data or more than one type of analysis.
Depending on the question being asked, some types of studies are able
to give
stronger results than others.
I. Classifying according to how the data are collected
In an observational
study, data are collected from a naturally occurring situation.
In an experiment,
the researchers deliberately do something (an "intervention" or
"manipulation" or "assignment to treatment") to affect at least some of
the data collected.
Examples:
- Researchers are interested in comparing reading scores
for students in schools with low average family income with scores for
students in schools with high average family income. They choose a
random sample of schools in each category. This is an observational study: the
researchers do nothing to affect either family income or
reading scores.
- Researchers are interested in comparing two methods for
teaching reading. They randomly assign half the schools in their sample
to one method and the other half to the other method. At the end of the
school year, they analyze reading scores of the children in the
schools. This is an experiment:
the researchers deliberately decide which students receive each
teaching method.
If the question of interest is to determine whether one thing
influences another, experiments give the strongest result. For example,
if in the second scenario, some schools already used one method and
other schools used the other, the researchers might decide to just take
one random sample of schools that use the first method and another
random sample of schools that use the second method and compare
results. This would not be as
convincing a study as one where schools were randomly assigned to method.
For example, it might be that the best teachers preferred one method.
The study then would not give information on whether the higher scores
were the result of having better teachers or were caused by the the
teaching method. This type of situation is called confounding: two variables (in this
case, quality of teacher and teaching method) cannot be separated out
in
the data used. Experiments are better because they reduce or eliminate
confounding.
Similarly, in the first example, if the researchers found that students
with low family income had lower reading scores than students with high
family income, they would not
be justified in concluding that low family
income causes low reading
scores. It might be that the same factors which caused the families to
have low or high income (one possibility might be parents' level of
education) also influenced the children's reading scores.
Unfortunately, in studying whether family income affects reading scores
(and in many other situations), it is not possible to do an
experiment -- it is not possible to randomly assign children's families
to high or low income. Thus in
situations where experiments are not
possible, there is more uncertainty in the results. In some such
situations, there are methods that can increase our confidence that
some causality is taking place.
Note: The
meaning of "experiment" used here is a technical one; be sure not to confuse it with other
definitions of "experiment." In particular, "experiment" as used in
statistics does not mean "try something to see what happens."
II. Classifying according to the purpose of the analysis
In exploratory
data analysis, the purpose is to investigate the data to see
what
patterns can be seen. In confirmatory data
analysis, a pattern has been hypothesized before the study, and the purpose
of
the study is to confirm or disconfirm the hypothesis.
As above, in most cases, an experiment is best for confirmatory data
analysis. However, experiments are not always possible. Sometimes all
that can be studied is whether the same pattern holds in a new
(preferably
randomly selected) data set.
Sometimes researchers may engage in both
confirmatory and exploratory data analysis with the same data set. In
this case the previously hypothesized patterns are sometimes called preplanned comparisons. The
exploratory analysis is sometimes called data snooping. When
statistical inference is used for both confirmatory and exploratory
analysis with the same data, care needs to be taken to avoid making
unwarranted claims resulting from multiple
inference.