COMMON MISTEAKS
MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Introduction
Types of Mistakes
Suggestions
Resources
Table of Contents
About
Choosing an Outcome1 Variable
In most research, one or more
outcome
variables
are measured. Statistical analysis is done on the outcome measures, and
conclusions are drawn from the statistical analysis. One
common source of misleading research results is giving inadequate
attention to the choice of outcome variables. Making a good choice
depends on the particulars of the
context, including the research question. There are no
one-size-fits-all rules. So this topic can best be approached by
examples.
Example 1: How to measure "big"?
There are various ways to measure how "big" an object, or person,
or
animal is. We might measure a person's "bigness" by their height, or by
their weight. If we are determining how many people we can fit on an
elevator, weight is a better measure than height; deciding on the basis
of height would be a silly mistake. If we are deciding how tall to make
a ceiling, height is a better measure than weight; deciding on the
basis of weight would be a silly mistake.
The US
Postal Service uses a combination of measures of "bigness" for
determining the maximum allowable size of a parcel: Weight must
be no more than 70 lbs, and length plus girth must be no more than 130
inches (for Parcel Post; see http://pe.usps.com/text/dmm300/101.htm#wp1034246
for definitions of "length" and "girth"and for requirements for other
types of mail service).
There are various measures that have been used to measure bigness in
the sense of obesity. The gold standard is percent of body fat, which
can be measured by using underwater weighing or special X-ray
techniques. Since these are expensive and impractical, various easier
measurements have been devised as approximations. These include waist
measurement, waist-to-hip ratio and body mass index (BMI) .
(For more information, see http://www.holisticonline.com/remedies/weight/weight_obesity-introduction.htm.)
BMI is often used, but has several valid criticisms. One is that lean,
muscular people are classified as obese. (These include mathematician Keith Devlin
and actor Tom Cruise.) At the other end of the spectrum, someone
can have very low BMI yet high waist-to-hip ratio. (The author of this
site is an example. Maybe I have a high percentage of body fat, or
maybe I just have small hips, or maybe both.)
Example 2: How to measure "unemployment
rate"?
References for this topic:
The official US Unemployment Rate is "Total unemployed persons, as a
percent of the civilian labor force." However, this begs
the definitions of "unemployed" and "civilian labor force". These
definitions are not what you might think. For example, "employed
persons" includes "All persons who did at least 15 hours of unpaid work
in a family-owned enterprise operated by someone in their household."
In 1976, the U.S. department introduced several "Alternative measures
of labor underutilization" (see references above) and regularly
publishes these other measures
of unemployment rate.
Two morals to draw from this example:
- Do not assume you understand what a measure is
just because
the name makes sense to you. Be sure to find and read the definition
carefully; it may not be what you think.
- Be especially careful when making comparisons.
The same term
might be used differently by different authors or in different places.
For example, different countries have different definitions of
unemployment rate. (See http://www.bls.gov/fls/flsfaqs.htm#laborforcedefinitions)
Example 3: What is a good outcome
variable for deciding whether cancer treatment in a country has been
improving?
A first thought might be "number of deaths in the country from
cancer
in one year." But number of
deaths might increase simply because the
population is increasing. Or it might go down if cancer incidence is
decreasing. "Percent of
the population that dies of cancer in one
year" would take care of the first problem, but not the second.
This example makes the point that a rate
is often a better measure than a count.
Example 4: What is a good outcome
variable for answering the question, "Do males or females suffer more
traffic fatalities?"
In light of the considerations in Example 3, a rate is probably
better
than a count. But what rate? Deaths per hour traveled or deaths per
mile traveled?
Example 5: What is a good
outcome
variable for research on the effect of medication on bone
fractures?
The outcome that is really of interest in this example is "number
of
fractures" (or possibly "number of hip fractures" or "number of
vertebral fractures," etc.), or perhaps (taking into account he lesson
from Example 3) "percentage of people in this category who have this
type of fracture." But often, bone density is taken as an
outcome for such research. Bone density is correlated with fracture risk, but
is not the same as fracture risk.
This is an example of what is called a proxy measure (or a surrogate measure). Sometimes it is
impossible (or not possible for practical purposes ) to use
the real measure, so proxy measures are better than nothing. (This is
the case with the measures of obesity mentioned above.) But it is
important not to confuse the proxy measure with the real outcome of
interest. Such confusion has happened with bone density -- it is now
common to talk about osteoporosis and osteopenia as "diseases" in their
own right. Yet they are only one factor affecting risk for fracture;
others include age, weight below 125 lbs., use of steroids (e.g.,
prednisone) or seizure medications, and high alcohol use. (See http://courses.washington.edu/bonephys/opbmd.html#tz
for more information on bone density and other markers of fracture
risk.)
See
http://patientsafetyed.duhs.duke.edu/module_a/measurement/proxy_measures.html
for more on when proxy measures might be appropriate.
Statistical Considerations
The examples above all concern what is an appropriate way to
measure a concept of interest. But there may be more than one way of
measuring that makes sense in conext, but one of these ways will be
better than others for its statistical properties (e.g., it may provide
a way to analyze the data that has greater power
than another way). For some considerations involving medical clinical
trials, see S. Senn and S. Julious (2009), Measurements in clinical
trials: A neglected issue for statisticians? Statistics in Medicine 28:
3189-3209
1. Most of the discussion
applies to predictor variables as well as outcome variables.
Last updated May 10,
2012