is said to have a Binomial distribution with parameters n and p. We have
Example : A company wants to hire 5 new employees. From previous experience they know that about 1 in 10 applicants are suitable for the jobs. What is the probability that if they interview 20 applicants they will be able to fill those 5 positions?
Consider each interview a "trial" with the only two possible outcomes: "success" (can be hired) or "failure" (not suitable). Assumptions:
1) "success probability" is the same for all applicants (as long as we know nothing else about them this is ok.)
2) trials are independent (depends somewhat on the setup of the interviews but should be ok)
then if we let X = "#number of suitable applicants in the group of 20" we have X~B(20,0.1) and using the command pbinom in R we find
Example (same as above) How many applicants will the company need to interview to be 90% sure to be able to fill at least one of the five positions?
if we let Y be the number of trials until the first success (= an applicant is suitable) we have Y~G(0.1). Then
We could also have used the command qgeom to do this in R: qgeom(0.9,0.1) +1 = 21 +1 = 22
Note The command geom in R is for a r.v. Y* = Y-1, that is it takes values 0,1, ... instead of 1,2,.. and P(Y*=k)=P(Y=k+1)
Note as with the geometric the R function nbinom uses a slightly different parametrization, it is for a r.v. Y* = Y-r
Example (same as above) How many applicants will the company need to interview to be 90% sure to be able to fill all of the five positions?
if we let Y be the number of trials until the 5th success we have Y~NB(0.1,5). Then using R we find qnbinom(0.9,5,0.1) = 73 + 5 = 78. (Note: it is not 5*20=100!)
Example say our company has a pool of 100 candidates for the job, 10 of whom are suitable for hiring. If they interview 50 of the 100, what is the probability that they will fill the 5 positions?
Here X~HG(50,10,90) and so P(X≥5) = 1- P(X≤4) = 1 - phyper(4,10,90,50) = 1 - 0.3703 = 0.6297
Note: the difference between the binomial and the hypergeometric distribution is that here we draw the balls without repetition. Of course, if n is small compared to N+M the probability of drawing the same ball twice is (almost) 0, so then the two distributions give the same answer.
Example using the binomial distribution for our Example we would have found P(X≥5) = 1 - pbinom(4,50,0.1) = 1 - 0.4312 = 0.5688, quite different from the hypergeometric. On the other hand if our candidate pool had 1000 applicants, 100 of whom are suitable we would have found P(X≥5) = 1- phyper(4,100,900,50) = 1 - 0.4269 = 0.5731.
One way to visualize the Poisson distribution is as follows say X ~ B(n,p) such that n is large and p is small. That is the number of trials is large but the success probability is small. Then X is approximately poisson with rate λ = np.
Example : say you drive from Mayaguez to San Juan. Assume that the probability that on one kilometer of highway there is a police car checking the speed is 0.04. What is the probability that you will encounter at least 3 police cars on your drive?
If we assume that the police cars appear independently (?) then X = # of police cars ~ B(180,0.04), so P(X≥3) = 1 - pbinom(2,180,0.04) = 1 - 0.0234 = 0.9766. One the other hand X is also approximately P(180*0.04) = P(7.2) and so P(X≥3) = 1 - ppois(2,7.2) = 1 - 0.0254 = 0.9746.