Probability : Basics

Why Probability?

You play a game with a friend. It is a really boring game: you flip a coin. If it comes up tails you give your friend $1, if it comes up heads he gives you $1. You have just played the game 10 times, and there were 2 heads and 8 tails, so you are $6 behind. Now you start to think that maybe your friend is not so good a friend and maybe he is trying to cheat you with a "loaded" coin. It seems there are two possibilities:

• the coin is fair, and you were just having bad luck
• the coin is loaded.

In the first case you can go on playing (and hopefully win your money back), in the second case you stop playing (and have a discussion with this guy about your $6).

How are you going to decide? Well, the obvious question is this: how likely is it that the coin is fair?
In Statistics we might consider the following question: Assuming the coin is fair, what is the probability that it comes up heads no more than 2 times in 10 flips?
If this probability is not so small you have no reason to be suspicious, if it is small - stop playing!

But how small does it have to be before you stop playing?
• 10%?
• 1%?
• 0.1%?

Obviously we need to know this probability to make an informed decision!

So, what is the probability? Let's do a simulation to find out:

• make a column called 'Coin' with 'H' and 'T'
• flip the coin 1000 times with Calc > Random Data > sample 1000 rows from column: Coin, store in c2, check Sample with replacement
• flip the coin 1000 times with Calc > Random Data > sample 1000 rows from column: Coin, store in c3, check Sample with replacement
repeat 8 times until
• flip the coin 1000 times with Calc > Random Data > sample 1000 rows from column: Coin, store in c11, check Sample with replacement
•Calc > Calculator, Store in: c12, Expression: (c2="H")+(c3="H")+(c4="H")+(c5="H")+(c6="H")+(c7="H")+(c8="H")+(c9="H")+(c10="H")+(c11="H")
• Count outcomes with Stat > Tables > Tally Individual Variables: c12

Here is the result of one such simulation:
Number of Heads Number of Times
0 2
1 14
2 44
3 95
4 223
5 263
6 191
7 111
8 47
9 9
10 1
So the probability of at most two heads is (2+14+44)/1000 = 0.06

Later we will see how to calculate this probability exactly, and it turns out to be 0.0547, so our simulation did pretty good!

For a real live example recall the 1970 draft. There we had a sample correlation coefficient r=-0.226. Again there are two possibilities:

• the draft was fair, and what we observed was a random fluctuation.
• the draft was not fair.

Again a decision can be based on the probability of a correlation as unusual as -0.226 (or even worse) happening in a fair draft. Our simulation showed that this probability is less than 0.00001, so here there is not much of a question which option we should believe.

The topics we have discussed sofar in this class (making graphs, computing things like the mean or the correlation coefficient) go under the heading of discriptive statistics. For the game above this is very simple - there were 60 heads in 100 flips of the coin. What we want to do now is make a guess what the true "state of nature" is based on the available information, namely decide whether 60 heads is compatible with the use of a fair coin. This type of problem goes under the heading of inferential statistics.

Introduction

The probability of rain tomorrow is 30%. What does that mean?

We usually find probabilities in one of three ways:

• empirically through many repetitions of an experiment - relative frequency interpretation
• through reasoning about outcomes etc. - classical interpretation
• by how we feel about it - subjective interpretation

Example - coin tossing
what is the probability of getting "heads" when tossing a fair coin?

using the relative frequency interpretation: the South African mathematician Jon Kerrich, while in a German POW camp during WWII tossed a coin 10000 times. Result 5067 heads, for a probability of 0.5067

using the classical interpretation: this experiment has two possible outcomes - heads and tails. Fair means they are equally likely, so p=P("heads")=P("tails")=0.5

using the subjective interpretation: I don't know, maybe 0.6?

A little bit of notation

An experiment is a well-defined procedure that produces a set of outcomes. For example:
• "flip a coin"
• "roll a die"
• "flip a coin twice"
• "randomly select a card from a standard 52-card deck"
are experiments.

A sample space is the set of outcomes from an experiment. Thus
• for "flip a coin" the sample space is {H, T}
• for "roll a die" the sample space is {1, 2, 3, 4, 5, 6}
• for "flip a coin twice" the sample space is {HH, HT, TH, TT}
• for "randomly select a card from a standard 52-card deck" the sample space is {2H, 2D, 2S, 2C, 3H, .., AC}

You need to be a little careful here. Consider again the experiment "flip a coin twice". In this experiment it is possible to distinguish between the outcome "first flip is heads and second flip is tails", which we denoted by "HT", and the outcome "first flip is tails and second flip is heads", which we denoted by "TH". Contrast this with the experiment "two identical coins are tossed at the same time". Now all we can say is that one possible outcome is "one coin came up heads, the other came up tails", but this is the same as "one coin came up tails, the other came up heads", so now the sample space is {HH,HT,TT}.

An event is a subset, say A, of a sample space S.

For the experiment "roll a die", an event is "obtain a number less than 3". Often we can write events in terms of the outcomes in the sample space. Here, the event is {1, 2}.
For the experiment "flip a coin twice" one event is "no Tails" = {HH}

Finding the probability of an event, A, from a sample space, S, is easy if all the outcomes in the samples space are equally likely, that is have the same probability. Then:

P(A) = n(A)/n(S)

So, the probability of an event is the number of outcomes in the event divided by the number of outcomes in the sample space.

For example

• flipping a coin what is the probability of a heads?  Here S={H,T}, so n(S)=2. A={H}, so n(A)=1.  Therefore P(A) = n(A)/n(S) = 1/2

• what is the probability of an even number when rolling a fair die? Here S={1, 2, 3, 4, 5, 6}, so n(S)=6. A={2, 4, 6}, so n(A)=3. Therefore P(A)=n(A)/n(S)=3/6=1/2

• what is the probability of no Tails when flipping a fair coin twice? S={HH, HT, TH, TT}, so n(S)=4. A= {HH}, so n(A)=1, Therefore P(A)=n(A)/n(S)=1/4
Notice that I said a fair coin. It is because of this that the four outcomes in the sample space are equally likely. If the coin were loaded they would not be, and our formula would not work.

• what is the probability of getting a king when randomly selecting a card from a standard 52-card deck?  S={2H, 2D, 2S, 2C, 3H, .., AC}, so n(S)=52. A={KH, KD, KS, KC}, so n(A)=4. Therefore P(king) = 4/52 = 1/13.

Example: What is the probability of a sum of 8 when rolling two fair dice?
Solution 1: Sample space is
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

so n(S)=36
The 5 numbers in red have a sum of 8, so n("sum is 8")=5, therefore P("sum of 8")=5/36=0.1389

Solution 2: The sum can be any number from 2 to 12, the sample space is S={2,3,4,..,11,12}. There are 11 numbers in the sample space, one of them is 8, so P("sum of 8")=1/11=0.091

Which is right? Let's do a simulation:

make a column Die with entries 1-6
make column Roll First Die by Calc > Random Data > Sample from columns, sample 10000, from Die, store in Roll First Die, check box sample with replacement
Repeat for column Roll Second Die
Compute column Sum of Dice (Calc > Calculator)
Statistics > Tables > tally Individual Variables, Sum of Dice, include percentage.

So solution 1 is right, but why?

It is because the outcomes in the sample space of Solution 2 are not equally likely, for example P("2")=0.028 and P("7")=0.194, so the formula P(A)=n(A)/S(A) does not work here. For the first solution we have P((i,j))=1/36 for all i=1,..,6 and j=1,..,6, so there the formula is ok.

For any event A we have
0≤P(A)≤1.
That is, the probability is always between 0 and 1 (Often we express probabilities as percentages, such as the probability of rain tomorrow is 30%, but that means the same as 0.3).
P(A)=0 means A can not happen.  For example, consider the experiment, roll a die: P(number greater than 6) = 0.  If P(A) = 1 then event A must happen. In the roll a die experiment, P(number < 6) = 1.

Warning This means that if you do are asked to find a probability and you come up with the answer P(A)=2.34, you know it has to be wrong!

Complement

Example Somebody tells you there is a 30% chance of rain tommorrow, Obviously, then there is a 70% chance that it will not rain tommorrow! This is an example of the complement:

The complement (opposite) of an event A is denoted by Ac.

Example We roll a fair die twice. A="at least one six", then Ac="no six"

Example We randomly pick an employee from WRInc. A="Employee is Female", then Ac="Employee is not female" = "Employee is male"

Example We randomly pick an employee from WRInc. A="Employee makes more than $50000", then Ac="Employee makes $50000 or less"

We have the formula
P(A) = 1-P(Ac)

Example We randomly pick an employee from WRInc. What is the probability that the job level of a randomly selected employee is less than 7?
Let A="job level is less than 7", then Ac="job level is 7", and so
P("job level is less than 7") = P(A) = 1-P(Ac) = 1-18/527 = 0.966

Example We flip a fair coin 10 times. What is the probability of getting "heads" at least once?
Solution: An outcome of this experiment is a sequence of 10 H's and T's, for example HHTHTTHTHH. We can write down the samplespace as follows:
S = {TTTTTTTTTT, TTTTTTTTTH, TTTTTTTTHT, TTTTTTTTHH, ..., HHHHHHHHHT, HHHHHHHHHH}
How many such outcomes are there? It turns out that
n(S) = 2×2×2×2×2×2×2×2×2×2 = 210 = 1024
Now A = "at least one heads", so
Ac = "no heads" = {TTTTTTTTTT}, so P(Ac) = n(Ac)/n(S) = 1/1024, and so
P(A) = 1-P(Ac) = 1-1/1024 = 1023/1024

The complementation formula is useful if the event A is

big and complicated but the event Ac is small and simple.

Independence

Two events A and B are said to be independent if knowledge of one does not effect the probability of the other.

Example We roll a fair die twice. What is the probability of a six on the second roll?
The sample space S is the same as above, so n(S)=36.
Let A = "six on the second roll" = {(1,6), (2,6), (3,6), (4,6), (5,6), (6,6)}, so n(A)=6, so P(A)=n(A)/n(S)=6/36=6.

Now assume we already rolled the die once, and we got a six. What is the probability that we get another six on the second roll?

Here S={(1, 2, 3, 4, 5, 6}, A={6}, so P(A)=n(A)/n(S)=1/6

but that is the same as before. So whether we know the outcome of the first roll or not, the probability of a six on the second roll is always 1/6!

Example It is known that in a certain population 5% of the men are color-blind. We randomly select 2 men. We check the first and find he is not color-blind. Even so the probability for second men is still 5%, it has not changed.

(This would not be true if the two men were brothers, for example. So we need to be careful to find a true random sample)

A Useful Formula

Example We randomly select two cards from a standard deck. What is the probability that both are Kings?
Now an outcome of this experiment might be "pick the Ace of Hearts and the Eight of Diamonds", which we could denote by (AH,8D). The sample space could be written as

S={(2H,2D), (2H,2C), ..., (AC,AS)}

But even finding out n(S) would not be so easy (actually n(S)=52·51/2 = 1326). But there is an easier way to find this probability:

Instead of picking two cards at once we will pick first onne card, and then pick another form the rest. Then

P(two K) = P(first is K)P(second is K of the rest) = 4/52·3/51 = .0045

This extends quite nicely:

P(first four cards drawn are K) = 5/52·3/51·2/50·1/49

For more on basic probability see page 207 of the textbook.