This
site is under construction. Please check back every few weeks for
updates
COMMON
MISTEAKS
MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Introduction
Types of Mistakes
Suggestions
Resources
Table of Contents
About
Summary Statistics for
Distributions with Large Variability
A
measure of center (such as the mean or median) of a random variable
gives limited information if the variability of the distribution is
large. So in most cases, we need some measure of variability as well as
a measure of center.
The standard deviation is one measure of variability that is commonly
used. It is especially appropriate for normal or near-normal
distributions, but less helpful for skewed
distributions.
Confidence intervals are another way of summarizing variability. The
endpoints of, say, a 95% confidence interval for a mean are
summary statistics. 1, 2
However, they are summary statistics that give information about the
variability of the sampling
distribution
of the mean of the original distribution, not about the original
distribution itself.
Notes
1.
This uses the word "statistic" in the technical sense of "something
calculated by a specific rule from data." The left and right endpoints
of a 95% confidence interval satisfy this definition. Many (but not
all) statistics are estimates of parameters.
(A
parameter is
a number which depends on the distribution (the random variable)
itself, but does not depend on the data.) For
example, the (sample) mean of a random sample of data from a
distribution is an estimate for the (population) mean (also known as
expected value, or expectation) of the distribution.
Similarly, the sample median is an estimate of the population median;
the sample standard deviation is an estimate of the population standard
deviation; etc. The endpoints of confidence intervals are unusual in
that they do not estimate parameters.
2.
The left and right endpoints of a 90% confidence interval for the mean
would be different statistics from the endpoints of a 95% confidence
interval for the mean.