COMMON MISTEAKS
MISTAKES IN
USING STATISTICS: Spotting and Avoiding Them
What Is Probability?
The notion of "the probability of something" is one of those
ideas, like "point" and "time," that we can't define exactly,
but that are useful nonetheless. The following should give a good
working understanding of the concept.
Events
First, some related terminology: The "somethings" that we consider
the probabilities of are usually called events. For example, we may talk
about the event that the number showing on a die we have rolled is 5;
or the event that it will rain tomorrow; or the event that someone in a
certain group will contract a certain disease within the next five
years.
Four Perspectives on Probability
Four perspectives on probability are commonly used: Classical,
Empirical, Subjective, and Axiomatic.
1. Classical
(sometimes called "A priori" or
"Theoretical")
This is the perspective on probability that most people first
encounter
in formal education (although they may encounter the subjective
perspective in informal education).
For example, suppose we consider tossing a fair die. There are six
possible numbers that could come up ("outcomes"), and, since the die is
fair, each one is equally likely to occur. So we say each of these
outcomes has probability 1/6. Since the event "an odd number comes up"
consists of exactly three of these basic outcomes, we say the
probability of "odd" is 3/6, i.e. 1/2.
More generally, if we have a situation (a "random process") in which
there are n equally likely outcomes, and the event A consists of
exactly m of these outcomes, we say that the probability of A is m/n.
We
may write this as "P(A) = m/n" for short.
This perspective has the advantage that it is conceptually simple for
many situations. However, it is limited, since many situations do not
have finitely many equally likely outcomes. Tossing a weighted die is
an example where we have finitely many outcomes, but they are not
equally likely. Studying people's incomes over time would be a
situation where we need to consider infinitely many possible outcomes,
since there is no way to say what a maximum possible income would be,
especially if we are interested in the future.
2. Empirical
(sometimes called "A posteriori"
or "Frequentist")
This perspective defines probability via a thought experiment.
To get the idea, suppose that we have a die which we are
told is weighted, but we don't know how it is weighted. We could get a
rough idea of the probability of each outcome by tossing the die a
large number of times and using the proportion of times that the die
gives that outcome to estimate the probability of that outcome.
This idea is formalized to define the probability of the event A as
P(A) = the limit as n approaches
infinity of m/n,
where n is the number of times the process (e.g., tossing the die)
is
performed, and m is the number of times the outcome A happens.
(Notice
that m and n stand for different things in this definition from what
they meant in Perspective 1.)
In other words, imagine tossing the die 100 times, 1000 times, 10,000
times, ... . Each time we expect to get a better and better
approximation to
the true probability of the event A. The mathematical way of
describing this is that the true probability is the limit of the
approximations, as the number of tosses "approaches infinity" (that
just means that the number of tosses gets bigger and bigger
indefinitely). Example
This view of probability generalizes the first view: If we indeed have
a fair die, we expect that the number we will get from this definition
is the same as we will get from the first definition (e.g., P(getting
1) = 1/6; P(getting an odd number) = 1/2). In addition, this second
definition also works for cases
when outcomes are not equally likely, such as the weighted die. It also
works in cases where it doesn't make sense to talk about the
probability of an individual outcome. For example, we may consider
randomly picking a positive integer ( 1, 2, 3, ... ) and ask, "What is
the
probability that the number we pick is odd?" Intuitively, the
answer
should be 1/2, since every other integer (when counted in order) is
odd. To apply this definition, we consider randomly picking 100
integers, then 1000 integers, then 10,000 integers, ... . Each time we
calculate what fraction of these chosen integers are odd. The resulting
sequence of fractions should give better and better approximations to
1/2.
However, the empirical perspective does have some disadvantages. First,
it involves a thought experiment. In some cases, the experiment could
never in practice be carried out more than once. Consider, for example
the probability that the Dow Jones average will go up tomorrow. There
is only one today and one tomorrow. Going from today to tomorrow is not
at all like rolling a die. We can only imagine all possibilities of
going from today to a tomorrow (whatever that means). We can't actually
get an approximation.
A second disadvantage of the empirical perspective is that it leaves
open the question of how large n has to be before we get a good
approximation. The example linked above shows that, as n increases, we
may have some wobbling away from the true value, followed by some
wobbling back toward it, so it's not even a steady process.
The empirical view of
probability is the one that is used in most statistical inference
procedures. These are called frequentist
statistics. The frequentist view is what gives credibility to
standard estimates based on sampling. For example, if we choose a large
enough random sample from a population (for example, if we randomly
choose a sample of 1000
students from the population
of all 50,000 students enrolled in the university), then the average of
some measurement (for example, college expenses) for the sample is a reasonable estimate of
the average for the population.
3.
Subjective
Subjective probability is an individual person's measure of belief
that an
event will occur. With this view of probability, it makes perfectly
good sense intuitively to talk about the probability that the Dow Jones
average
will go up tomorrow. You can quite rationally take your subjective view
to agree with the classical or empirical views when they apply, so the
subjective perspective can be taken as an expansion of these other
views.
However, subjective probability also has its downsides. First, since it
is subjective, one person's probability (e.g., that the Dow Jones will
go up tomorrow) may differ from another's. This is disturbing to many
people. Sill, it models the reality that often people do differ in
their judgments of probability.
The second downside is that subjective probabilities must obey certain
"coherence" (consistency) conditions in order to be workable. For
example, if you believe that the probability that the Dow Jones will go
up tomorrow is 60%, then to be consistent you cannot believe that the
probability that the Dow Jones will do down tomorrow is also 60%. It is
easy to fall into subjective probabilities that are not coherent.
The subjective perspective of
probability fits well with Bayesian statistics, which are an
alternative to the more common frequentist statistical methods. (This
website will mainly focus on frequentist statistics.)
4. Axiomatic
This is a unifying perspective. The coherence conditions needed
for subjective probability can be proved to hold for the classical and
empirical definitions. The axiomatic perspective codifies these
coherence conditions, so can be used with any of the above three
perspectives.
The axiomatic perspective says that probability is any function (we'll
call it P) from events to numbers satisfying the three conditions
(axioms) below. (Just what constitutes events will depend on the
situation where probability is being used.)
The three axioms of
probability:
- 0 ≤ P(E) ≤ 1 for every allowable event E. (In other
words, 0 is the smallest allowable probability and 1 is the largest allowable
probability).
- The certain event has probability 1. (The certain event is the event "some
outcome occurs." For example, in rolling a die, the certain event is
"One of 1, 2, 3, 4, 5, 6 comes up." In considering the stock market,
the certain event is "The Dow Jones either goes up or goes down or
stays the same.")
- The probability of the union of mutually exclusive events is
the sum of the probabilities of the individual events. (Two events are
called mutually exclusive if
they cannot both occur simultaneously. For example, the events "the die
comes up 1" and "the die comes up 4" are mutually exclusive, assuming
we are talking about the same toss of the same die. The union of events is the event that
at least one of the events occurs. For example, if E is the event "a 1
comes up on the die" and F is the event "an even number comes up on the
die," then the union of E and F is the event "the number that comes up
on the die is either 1 or even."
If we have a fair die, the axioms of probability require that each number comes up
with probability 1/6: Since the die is fair, each number comes up with
the same probability. Since the outcomes "1 comes up," "2 comes up,"
..."6 come up" are mutually exclusive and their union is the certain
event, Axiom III says that
P(1 comes up) + P( 2 comes up) +
... + P(6 comes up) = P(the certain event),
which is 1 (by Axiom 2). Since all six probabilities on the left
are equal, that common probability must be 1/6.