COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Introduction      Types of Mistakes       Suggestions       Resources       Table of Contents      About    

Glossary    Blog



Choosing an Outcome1 Variable

In most research, one or more outcome variables are measured. Statistical analysis is done on the outcome measures, and conclusions are drawn from the statistical analysis. One common source of misleading research results is giving inadequate attention to the choice of outcome variables. Making a good choice depends on the particulars of the context, including the research question. There are no one-size-fits-all rules. So this topic can best be approached by examples.

Example 1: How to measure "big"?

There are various ways to measure how "big" an object, or person, or animal is. We might measure a person's "bigness" by their height, or by their weight. If we are determining how many people we can fit on an elevator, weight is a better measure than height; deciding on the basis of height would be a silly mistake. If we are deciding how tall to make a ceiling, height is a better measure than weight; deciding on the basis of weight would be a silly mistake.

The US Postal Service uses a combination of measures of "bigness" for determining the maximum allowable size of a parcel:  Weight must be no more than 70 lbs, and length plus girth must be no more than 130 inches (for Parcel Post; see http://pe.usps.com/text/dmm300/101.htm#wp1034246 for definitions of "length" and "girth"and for requirements for other types of mail service).

There are various measures that have been used to measure bigness in the sense of obesity. The gold standard is percent of body fat, which can be measured by using underwater weighing or special X-ray techniques. Since these are expensive and impractical, various easier measurements have been devised as approximations. These include waist measurement, waist-to-hip ratio and body mass index (BMI) .  (For more information, see  http://www.holisticonline.com/remedies/weight/weight_obesity-introduction.htm.) BMI is often used, but has several valid criticisms. One is that lean, muscular people are classified as obese. (These include mathematician Keith Devlin and actor Tom Cruise.)  At the other end of the spectrum, someone can have very low BMI yet high waist-to-hip ratio. (The author of this site is an example. Maybe I have a high percentage of body fat, or maybe I just have small hips, or maybe both.)

Example 2: How to measure "unemployment rate"?

References for this topic:
http://www.bls.gov/news.release/empsit.t12.htm
http://www.bls.gov/cps/cps_htgm.htm#employed 
The Unemployment Rate and Beyond: Alternative Measures of Labor Underutilization, downloadable in pdf from from http://www.bls.gov/cps/lfcharacteristics.htm#altmeasures
 
The official US Unemployment Rate is "Total unemployed persons, as a percent of the civilian labor force."  However,  this begs the definitions of "unemployed" and "civilian labor force". These definitions are not what you might think. For example, "employed persons" includes "All persons who did at least 15 hours of unpaid work in a family-owned enterprise operated by someone in their household." In 1976, the U.S. department introduced several "Alternative measures of labor underutilization" (see references above) and regularly publishes these other measures of unemployment rate.

Two morals to draw from this example:
  1. Do not assume you understand what a measure is just because the name makes sense to you. Be sure to find and read the definition carefully; it may not be what you think.
  2. Be especially careful when making comparisons. The same term might be used differently by different authors or in different places. For example, different countries have different definitions of unemployment rate. (See http://www.bls.gov/fls/flsfaqs.htm#laborforcedefinitions)

Example 3: What is a good outcome variable for deciding whether cancer treatment in a country has been improving?

A first thought might be "number of deaths in the country from cancer in one year." But number of deaths might increase simply because the population is increasing. Or it might go down if cancer incidence is decreasing.  "Percent of the population that dies of cancer in one year" would take care of the first problem, but not the second.

This example makes the point that a rate is often a better measure than a count.

Example 4: What is a good outcome variable for answering the question, "Do males or females suffer more traffic fatalities?"

In light of the considerations in Example 3, a rate is probably better than a count. But what rate? Deaths per hour traveled or deaths per mile traveled?  

Example 5: What is a good outcome variable for research on the effect of medication on bone fractures? 

The outcome that is really of interest in this example is "number of fractures" (or possibly "number of hip fractures" or "number of vertebral fractures," etc.), or perhaps (taking into account he lesson from Example 3) "percentage of people in this category who have this type of fracture." But often, bone density is taken as an outcome for such research. Bone density is correlated with fracture risk, but is not the same as fracture risk. 

This is an example of what is called a proxy measure  (or a surrogate measure). Sometimes it is impossible (or not possible for practical purposes ) to use the real measure, so proxy measures are better than nothing. (This is the case with the measures of obesity mentioned above.) But it is important not to confuse the proxy measure with the real outcome of interest. Such confusion has happened with bone density -- it is now common to talk about osteoporosis and osteopenia as "diseases" in their own right. Yet they are only one factor affecting risk for fracture; others include age, weight below 125 lbs., use of steroids (e.g., prednisone) or seizure medications, and high alcohol use.  (See http://courses.washington.edu/bonephys/opbmd.html#tz for more information on bone density and other markers of fracture risk.)

 See http://patientsafetyed.duhs.duke.edu/module_a/measurement/proxy_measures.html for more on when proxy measures might be appropriate.

Statistical Considerations

The examples above all concern what is an appropriate way to measure a concept of interest. But there may be more than one way of measuring that makes sense in conext, but one of these ways will be better than others for its statistical properties (e.g., it may provide a way to analyze the data that has greater power than another way). For some considerations involving medical clinical trials, see S. Senn and S. Julious (2009), Measurements in clinical trials: A neglected issue for statisticians? Statistics in Medicine 28: 3189-3209 
1. Most of the discussion applies to predictor variables as well as outcome variables.

Last updated May 10, 2012