Random Variables and Distributions

A random variable (rv) is a variable whose value is a numerical outcome of an experiment. We denote random variables by X, Y, ..

Example 1: We roll a fair die, X is the number shown on the die
Example 2: We roll two fair dice, X is the sum of the dice
Example 3: We roll two fair dice, X is the absolute difference of the dice
Example 4: We roll two fair dice, X is the number of "sixes"
Example 5: We randomly choose 10 employees of WRInc, X is the number of employees in the sample with a job level of 1

Discrete Random Variables

A rv X is called discrete if observations drawn from it would be discrete data.

Example 1 above: X takes one of the values {1,2,3,4,5,6}, so if we had 1000 observations drawn from this rv it would be only these numbers, repeated many times.
Example 6: We repeatedly roll a fair die. X is the number of rolls needed until the first "six", then X takes one of the values {1,2,3, ...}
Here theoretically X can take any (integer) value, but in practise we will only ever see a few of them (1-10 or so), and these would repeat many times.

The probability mass function (pmf) of a discrete random variable X is a list of the values that X takes and their respective probabilities.

Example 1 above:
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6

Example 4 above:
x 0 1 2
P(x) 25/36 10/36 1/36

Notice: the probabilities always sum up to 1!

Example We randomly select 2 employees of WR Inc. Let the random variable X be the nuber of females selected. Find the pmf of X.

What are the possible values of X? 0, 1 and 2 (none, exactly one and two females). What are the probabilities of these outcomes?

P(X=0) = P(both people selected are male) = P(first is male and second is male) = P(first is male)P(second is male|first is male) = 321/527·320/526 = 0.3706
P(X=2) = P(both people selected are female) = P(first is female and second is female) = P(first is female)P(second is female|first is female) = 206/527·205/526 = 0.1523

So we get the following table:
x 0 1 2
P(x) 0.3706   0.1523
but the probabilities have to sum up to 1, so the one missing is
P(X=1)=1-0.3706-0.1523=0.4771 and we have
x 0 1 2
P(x) 0.3706 0.4771 0.1523

Example 6 above: here we have infinitely many outcomes, so we need to write the pmf a little different, namely using an equation instead of a table. First let Ai="no six on ith roll. Then we find

P(X=1) = P(six on first roll) = P(Ac1) = 1/6

P(X=2) = P(no six on first roll, six on second roll) =
P(A1Ac2) =
P(A1)P(Ac2) (by independence)
= 5/6·1/6

P(X=3) = P(A1A2Ac3) =
P(A1)P(A2)P(Ac3) (by independence)
= 5/6·5/6·1/6

It's easy to guess the general case:

P(X=k) = (5/6)k-1×1/6, k=1,2,..
So for example
P(first six on the 5th roll) = P(X=5) = (5/6)5-1×1/6 = 0.0804

Continuous Random Variables

A random variable X is called continuous if observations from such a rv. are continuous data.

Example We do a survey and ask people their annual income. Possible answers are {57345, 23950,104607, ...}

The equivalent of the probability mass function for continuous r.v. is called the density. Probabilities for continuous r.v.'s are found by finding areas under the density.

Example Consider the following density:

Now if X is a rv with this density and we need P(2≤X≤4) than that is the shaded area on the left and P(6≤X≤8) is the shaded area on the right.

Generally finding areas under curves is a Calculus problem (Integration) and beyond the scope of this class.

How do you "read" a density? What is it telling us? Consider the density above.
This density has a number of features:

• the curve is highest around x=7, so if we had observations drawn from this density we would expect many of them to be around 7
• the curve has a valley at around x=5, so only a few observations should be around 5.
• there is another peak at x=3, but it is only about half as high as the peak at x=7, so we would expect half as many observations around 2.5 as there are around 6.5.
• the density is very low (0?) for x<0 or x>10, so there should be no observations outside the range (0,10).

There is a connection between densities and histograms. If we have a large data set and we draw a histogram (which is properly scaled) and then draw the density on top, the density should follow the top of the histogram. In the next graph you see a histogram of 1000 observations drawn from the density above, overlayed by the density:

By the way, a density with this shape is called bimodel. Can you think of any real life data that might have this shape?

Population vs Sample

The graph above shows the basic idea here: the histogram describes the actual sample, the density describes the population from which the sample is drawn.

For more on random variables see page 232 of the textbook.