
Example : A company wants to hire 5 new employees. From previous experience they know that about 1 in 10 applicants are suitable for the jobs. What is the probability that if they interview 20 applicants they will be able to fill those 5 positions?
Consider each interview a "trial" with the only two possible outcomes: "success" (can be hired) or "failure" (not suitable). Assumptions:
1) "success probability" is the same for all applicants (as long as we know nothing else about them this is ok.)
2) trials are independent (depends somewhat on the setup of the interviews but should be ok)
then if we let X = "#number of suitable applicants in the group of 20" we have X~B(20,0.1) and using the command pbinom in R we find
Example Say we want to do a mail survey, that is we send letters with questionnaires to randomly selected people and hope they fill it out and send it back. From long experience it is known that such surveys have a "return rate" of about 25%, that is only 1 in 4 people send their survey back. How many surveys do we need to send out to be 99% sure to get at least 100 back?
Say we send out n questionnaires. Let the rv X be the number of questionnaires we get back, then X~Bin(n,0.25). We need to solve the equation P(X≥100) = 0.99.
How do we find n? Here are two very different methods:
1) "Trial and error": We have X~Bin(n,0.25), so play around with different values of n until you find one with pbinom(100,n,0.25)=0.01
Another solution uses the normal approximation. Note that
mX = np = n×0.25and
sX = √npq = √(n×0.25×0.75) = 0.433√n
and so X~N(0.25n,0.433√n)
We need n such that 0.99 = P(X>100) = 1-P(X<100), or P(X<100)=0.01, so

and so
(100-0.25n)/(0.433√n) = qnorm(0.01) = -2.326
now:

which gives either n=(51.0144-10.12)/0.125=327 or n=(51.0144+10.12)/0.125=489.
So the quadratic equation gives us two possible solutions, so let's check which one is right. We find pbinom(100,327,0.25)=0.9906 and pbinom(100,489,0.25)=0.0103, so we see n=489 is the correct answer.
The main advantage of the last solution is that it is quite general. Say this company sends out questionnaires all the time, but with different return rates p, different desired number of returns m and a different probability pm of at least m resturns. Repeating the above calculation for this general case we find

Example (same as above) How many applicants will the company need to interview to be 90% sure to be able to fill at least one of the five positions?
if we let Y be the number of trials until the first success (= an applicant is suitable) we have Y~G(0.1). Then
We could also have used the command qgeom to do this in R: qgeom(0.9,0.1) +1 = 21 +1 = 22
Note The command geom in R is for a r.v. Y* = Y-1, that is it takes values 0,1, ... instead of 1,2,.. and P(Y*=k)=P(Y=k+1)
In general the geometric rv. is a model for "lifetimes" or "times until failure" of components, that is for the number of time periods until a component fails. But how do we know in real live whether the geometric might be a good model for a specific case? The next theorem helps:
Theorem. Say X is a discrete rv. on {1,2,3,..} Then P(X>k)=P(X>k+j|X>j) for all k and j iff X~G(p).
Note P(X>k)=P(X>k+j|X>j) for all k and j is called the memoryless property, and the theorem states that for discrete rv.s on the positive integers this property is unique to the geometric rv.
proof Say X~G(p), then
P(X>k) = 1-P(X≤k) = 1-∑i=1kpqi-1 = 1-p∑i=0k-1qi =
1-p·(1-q(k-1)+1)/(1-q) = qk
and

now assume X
{1,2,..} and has the memoryless property. Let the event A={X>1}, then

So the geometric is a reasonable model if it is reasonable to assume an experiment has the memoryless property.
Example Say we want to model the number of days until a light bulb burns out. Is the geometric a good model for this? The question is whether the number of days has the memoryless property?
Example Say we want to model the number of years until a person dies. Is the geometric a good model for this? The question is whether the number of years has the memoryless property?
Note as with the geometric the R function nbinom uses a slightly different parametrization, it is for a r.v. Y* = Y-r
Example (same as above) How many applicants will the company need to interview to be 90% sure to be able to fill all of the five positions?
if we let Y be the number of trials until the 5th success we have Y~NB(0.1,5). Then using R we find qnbinom(0.9,5,0.1) = 73 + 5 = 78. (Note: it is not 5*20=100!)
Example say our company has a pool of 100 candidates for the job, 10 of whom are suitable for hiring. If they interview 50 of the 100, what is the probability that they will fill the 5 positions?
Here X~HG(50,10,90) and so P(X≥5) = 1- P(X≤4) = 1 - phyper(4,10,90,50) = 1 - 0.3703 = 0.6297
Note: the difference between the binomial and the hypergeometric distribution is that here we draw the balls without repetition. Of course, if n is small compared to N+M the probability of drawing the same ball twice is (almost) 0, so then the two distributions give the same answer.
Example using the binomial distribution for our Example we would have found P(X≥5) = 1 - pbinom(4,50,0.1) = 1 - 0.4312 = 0.5688, quite different from the hypergeometric. On the other hand if our candidate pool had 1000 applicants, 100 of whom are suitable we would have found P(X≥5) = 1- phyper(4,100,900,50) = 1 - 0.4269 = 0.5731.
One way to visualize the Poisson distribution is as follows say X ~ B(n,p) such that n is large and p is small. That is the number of trials is large but the success probability is small. Then X is approximately poisson with rate l = np. Here is a proof: let B~Bin(n,p) and P~Pois(l). Then their moment generating functions are given by

And here is another proof:

so the approximation works for x=0, and then the recursion relationship assures that it works for all x as well.
Example : say you drive from Mayaguez to San Juan. Assume that the probability that on one kilometer of highway there is a police car checking the speed is 0.04. What is the probability that you will encounter at least 3 police cars on your drive?
If we assume that the police cars appear independently (?) then X = # of police cars ~ B(180,0.04), so P(X≥3) = 1 - pbinom(2,180,0.04) = 1 - 0.0234 = 0.9766. One the other hand X is also approximately P(180*0.04) = P(7.2) and so P(X≥3) = 1 - ppois(2,7.2) = 1 - 0.0254 = 0.9746.
The main questions with approximations are always:
1) how good is it?
2) when does it work?
play around with binpois(n,p) to see

Example we roll a fair die 100 times. Let X1 be the number of "1"s, X2 be the number of "2"s,..,X6 be the number of "6"s. Then (X1,..,Xn)~M(100,1/6,..,1/6)
Note: if n=2 we have x1+x2=m, or x2=m-x1 and p1+p2=1, so

and so X1~Bin(m,p1). The multinomial distribution is therefore a generalization of the binomial distribution where each trial has n possible outcomes.
Theorem Let (X1,..,Xn)~M(m,p1,..,pn). Then the marginal distribution of Xk is Bin(m,pk)
proof:
let's denote by Bx={(x1,..,xk-1,xk+1,..,xn) : x1+..+xk-1+xk+1+..+xn =m-x}, then:

where the sum is 1 because we summing over all possible values of a mulitnomial rv (Y1,..,Yn-1)~M(m-x,p1 /(1-pk),..,pn /(1-pk), or because we use the multinomial theorem from calculus.
From this it follows that E[Xk]=mpk and V[Xk]=mpk(1-pk)
Theorem Let (X1,..,Xn)~M(m,p1,..,pn). Then the conditional distribution of (X1,..,Xn)|Xk=x~M(m-x,p1 /(1-pk),..,pn /(1-pk)
Theorem Let (X1,..,Xn)~M(m,p1,..,pn). Then Cov(Xi,Xj)=-mpipj