This
site is under construction. Please check back every few weeks for
updates
COMMON MISTEAKS MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
Simple Random Samples
The simplest type of random sample is a simple random sample, often
called an SRS. Moore and McCabe define a simple random
sample as follows:
"A
simple random sample (SRS)
of
size n consists of n
individuals from the population chosen in such a way that every set of
n individuals has an equal chance to be the sample actually selected."1.
Here, population
refers to the collection of people, animals,
locations, etc. that the study is focusing on.
Some examples:
- In a medical study, the population might
be all adults over
age 50 who have high blood pressure.
- In another study, the population might be
all hospitals in
the U.S. that perform heart bypass surgery.
- If we are studying whether a certain die
is fair or
weighted, the population would be all possible tosses of the die.
In Example 3, it is fairly easy to get a simple random
sample:
Just toss the die n times, and record each outcome.
Selecting a simple random sample in examples 1 and 2 is much
harder. A good way to select a simple random sample for Example 2
would proceed as follows:
First, obtain or make a list
of
all hospitals in the U.S. that perform heart bypass surgery. Number
them 1, 2, ... up to to the total number M of hospitals in the
population. (Such a list is called a sampling
frame.)
Then use some sort of random number generating process2
to obtain a
simple random sample of size n from the population of integers 1, 2,
..., M. The simple random sample of hospitals
would consist of the hospitals in the list that correspond to the
numbers in the SRS of numbers.
In theory, the same process could be used in Example 1. However,
obtaining the sampling frame would be much harder -- probably
impossible. So some compromises may need to be made.
Unfortunately, these compromises can easily lead to a sample that is biased
or otherwise not close enough to random to be suitable for the
statistical procedures used.
Indeed, even the sampling procedure described above is a compromise and
may not be suitable in some situations, described in the next section.
Notes
1. Moore, David S. and George P. McCabe (2006), Introduction to the Practice of Statistics,
fifth edition, Freeman, p. 219. The same definition appears on p. 196
of
Moore, David S. (2007), The Basic
Practice of Statistics, fourth edition, Freeman. These and other
introductory texts by Moore and co-authors are among the best
introductory texts for
pointing out many of the common errors in using statistics.
2. Think of the process that is used in selecting winning numbers
in some lotteries:
Put M balls, labeled 1, 2, ..., M, in a container that can mix the
balls up thoroughly. After mixing, select one ball (without looking at
the number
on it or any other ball), mix again, select a second ball (without
looking at numbers), mix again, and continue until n balls have been
selected. In practice, computer processes that are (we hope) close
enough to random are used.