Probability 1: Fundamentals

The probability of rain tomorrow is 30% (or 0.3).
What does that mean?

How does the weatherman know?

There are three basic method for find probabilities:

• empirically through many repetitions of an experiment - relative frequency interpretation
• through reasoning about outcomes etc. - classical interpretation
• by using our intuition and experience - subjective interpretation

Example - coin tossing
what is the probability of getting "heads" when tossing a fair coin?

• relative frequency interpretation: take a coin and flip it!
the South African mathematician Jon Kerrich, while in a German POW camp during WWII tossed a coin 10000 times. Result 5067 heads, for a probability of 0.5067

• classical interpretation:
This experiment has two possible outcomes - heads and tails. Fair means they are equally likely, so p=P("heads")=P("tails")=0.5

• subjective interpretation: I think it's ½

An experiment is a well-defined procedure that produces a set of outcomes. For example , "roll a die"; "randomly select a card from a standard 52-card deck"; "flip a coin" and "pick any moment in time between 10am and 12 am" are experiments. A sample space is the set of outcomes from an experiment. Thus, for "flip a coin" the sample space is {H, T}, for "roll a die" the sample space is {1, 2, 3, 4, 5, 6} and for "pick any moment in time between 10am and 12 am" the sample space is [10, 12].
An event is a subset, say A, of a sample space S. For the experiment "roll a die", an event is "obtain a number less than 3". Here, the event is {1, 2}.
If all the outcomes of a sample space S are equally likely and if A is an event, then the probability of A is:

So, the probability of an event, say A, is the ratio of success to total. For example, flipping a coin what is the probability of a heads?  Here, the total number of outcomes is 2 and the number of ways to be successful is 1.  Thus, P(heads) = 1/2. As another Example , consider randomly selecting a card from a standard 52-card deck: what is the probability of getting a king?  Here, the total number of outcomes is 52 and of these outcomes 4 would be successful. So, P(king) = 4/52.

Example : What is the probability of a sum of 8 when rolling two fair dice?
Solution 1: Sample space is
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

The 5 numbers in red have a sum of 8, so P(sum of 10)=5/36=0.1389
Solution 2: The sum can be any number from 2 to 12, the sample space is {2,3,4,..,11,12}. There are 11 numbers in the sample space, one of them is 8, so P(sum of 10)=1/11=0.091
Which is right?

Let's do a simulation to see which answer is correct.
use command "sample" to randomly pick an element from a set
args(sample) shows you the correct syntax of the "sample command
sample(1:6,size=2,replace=T) picks two numbers from 1 to 6 with repetition
sum(sample(1:6,2,T)) finds their sum, just what we want
z=1:100000 generates a vector of length 100000
for(i in 1:100000) z[i]=sum(sample(1:6,2,T)) repeats our experiment 100000 times
length(z[z==8])/100000 finds the proportion of "8's" in z
But why is it right?

The above sequence of R commands is nice and easy, but not very efficient. here is a one-line command that does it all:

length(c(1:100000)[apply(matrix(sample(1:6,size=200000,replace=T),ncol=2),1,sum)==8])/100000

Fundamentals

The definition above works well as long as S is finite but breaks down if S is infinite. Instead modern probability, like geometry, is built on a small set of basic rules called axioms, derived in the 1930's by Kolmogorov. They are:

if A1, ..., An are mutually exclusive

Example : Derive the formula above (for a finite sample space) from these axioms.

Solutions: say we have a sample space S={e1, ..., en} and an event A={ek1, ..., ekm}. Then:

Set Theory

Recall the following formulas for sets:
Commutativity:
AB = BA and AB=BA

Associativity
A(BC) = (AB)C
A(BC) = (AB)C

Distributive Law
A(BC) = (AB))(AC)
A(BC) = (AB)(AC)

DeMorgan's Law
(AB)c = Ac Bc
(AB)c = Ac Bc

Some useful formulas

Addition Formula: P(AB) = P(A)+P(B)-P(AB)
Proof: first note that
AB = (ABc) (AB) (AcB)
and that all of these are disjoint. Therefore by the third axiom we have
P(AB) = P(ABc) + P(AB) + P(AcB)
but
P(ABc) + P(AB) + P(AcB) =
{P(ABc)+P(AB)} + {P(AcB)+P(AB)} - P(AB) =
P( (ABc)(AB)) + P( (AcB)(AB)) - P(AB) =
P( A(BcB)) + P( (AcA)B)) - P(AB) =
P(AS)+P(SB)-P(B) =
P(A)+P(B)-P(B)

Example We roll two fair dice. What is the probability of a sum of 5 or 8, or highest number on either die is a 3?
Sample Space is above.
Event A = {(1,4), (2,3), (3,2), (4,1), (2,6), (3,5), (4,4), (5,3), (6,2)}, n(A) = 9
Event B = {(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)}, n(B) = 9
Event AB = {(2,3), (3,2)}, n(AB) = 2
P(AB) = P(A)+P(B)-P(AB) = 9/36+9/36-2/36 = 16/36 = 4/9

Bonferroni's Inequality
P(AB) ≥ P(A)+P(B)-1
proof follows directly from the Addition formula and P(AB)≤1

Example Say we know that the probability if a hurricane hit on the USA next year is 75% and that the probability of a strong earthquake next year in the USA is 60%. What is the probability of both happening?
P("Hurricane" and "Earthquake") ≥ P("Hurricane")+P("Earthquake")-1 = 0.75+0.6-1 = 0.35

Complement: P(A) = 1 - P(Ac)
Proof: S = A Ac, so
1 = P(S) = P(A Ac) = P(A) + P(Ac)

Example : A fair coin is tossed 5 times. What is the probability of at least one "Heads"?
Sample Space S={(H,H,H,H,H),(H,H,H,H,T), ... , (T,T,T,T,T)}
S has 25 = 32 elements
P(at least one "Heads") = 1 - P("No Heads") = 1 - P({(T,T,T,T,T)}) = 1 - 1/36 = 35/36

Subset
If AB then P(A)≤P(B)
Proof: B = BS = B(AAc) =
(BA) (BAc) =
A (BAc), so
P(B) = P(A (BAc) = P(B) + P(BAc) ≥ P(B)

There is an obvious extension of this formula, called Boole's Inequality:
P(ni=1Ai) ≤ ∑ni=1P(Ai)