This site is under construction. Please check back every few weeks for updates

COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Types of Studies

There are two broad ways of classifying studies involving statistics: according to how the data are collected, and according to the purposes of the analysis of the data. A single research paper might include more than one type of data or more than one type of analysis. Depending on the question being asked, some types of studies are able to give stronger results than others.

I. Classifying according to how the data are collected

In an observational study, data are collected from a naturally occurring situation. In an experiment, the researchers deliberately do something (an "intervention" or "manipulation" or "assignment to treatment") to affect at least some of the data collected.

Examples:
• Researchers are interested in comparing reading scores for students in schools with low average family income with scores for students in schools with high average family income. They choose a random  sample of schools in each category. This is an observational study: the researchers do nothing to affect either family income or reading scores.
• Researchers are interested in comparing two methods for teaching reading. They randomly assign half the schools in their sample to one method and the other half to the other method. At the end of the school year, they analyze reading scores of the children in the schools. This is an experiment: the researchers deliberately decide which students receive each teaching method.
If the question of interest is to determine whether one thing influences another, experiments give the strongest result. For example, if in the second scenario, some schools already used one method and other schools used the other, the researchers might decide to just take one random sample of schools that use the first method and another random sample of schools that use the second method and compare results. This would not be as convincing a study as one where schools were randomly assigned to method. For example, it might be that the best teachers preferred one method. The study then would not give information on whether the higher scores were the result of having better teachers or were caused by the the teaching method. This type of situation is called confounding: two variables (in this case, quality of teacher and teaching method) cannot be separated out in the data used. Experiments are better because they reduce or eliminate confounding.

Similarly, in the first example, if the researchers found that students with low family income had lower reading scores than students with high family income, they would not be justified in concluding  that low family income causes low reading scores. It might be that the same factors which caused the families to have low or high income (one possibility might be parents' level of education) also influenced the children's reading scores.

Unfortunately, in studying whether family income affects reading scores (and in many other situations), it is not possible to do an experiment -- it is not possible to randomly assign children's families to high or low income. Thus in situations where experiments are not possible, there is more uncertainty in the results. In some such situations, there are methods that can increase our confidence that some causality is taking place.

Note: The meaning of "experiment" used here is a technical one; be sure not to confuse it with other definitions of "experiment." In particular, "experiment" as used in statistics does not mean "try something to see what happens."

II. Classifying according to the purpose of the analysis

In exploratory data analysis, the purpose is to investigate the data to see what patterns can be seen. In confirmatory data analysis, a pattern has been hypothesized before the study, and the purpose of the study is to confirm or disconfirm the hypothesis.

As above, in most cases, an experiment is best for confirmatory data analysis. However, experiments are not always possible. Sometimes all that can be studied is whether the same pattern holds in a new (preferably randomly selected) data set.

Sometimes researchers may engage in both confirmatory and exploratory data analysis with the same data set. In this case the previously hypothesized patterns are sometimes called preplanned comparisons. The exploratory analysis is sometimes called data snooping. When statistical inference is used for both confirmatory and exploratory analysis with the same data, care needs to be taken to avoid making unwarranted claims resulting from multiple inference.