Example : we roll fair die until the first time we get a six. What is the expected number of rolls?
We saw that f(x) = 1/6*(5/6)x-1 if x
{1,2,..}. Here we just have g(x)=x, so
How do we compute this sum? Here is a "standard" trick:
and so we find

This is a special example of a geometric rv, that is a discrete rv X with pmf f(x)=p(1-p)x-1, x=1,2,..
Note that if we replace 1/6 above with p, we can show that
Example Say (X,Y) is a discrete rv with joint pmf f(x,y)=cpx, x,y
{0,1,..}, y≤x, and 0<p<1. Find c
we already did that before by summing first over y and then over x. We can use the above for an even simpler proof:

where G is a geometric rv with rate 1-p
X is said to have a uniform [A,B] distribution if f(x)=1/(B-A) for A<x<B, 0 otherwise.
Find EXk (this is called the kth moment of X).
some special expectations are the mean of X defined by m=EX and the variance defined by s2=V(X)=E(X-m)2. Related to the variance is the standard deviation s, the square root of the variance.
Here are some formulas for expectations:
the last one is a useful formula for finding the variance and/or the standard deviation. Here is its proof:
V(X) = E[(X-m)2] = E[X2-2Xm+m2] =EX2-2mEX+m2=EX2-m2
Example : find the mean and the standard deviation of a uniform [A,B] r.v.
Example Find the mean and the standard deviaiton of an exponential rv with rate l.

Let's use R to check. remember R uses 1/l where we use l
Say l=2
y=rexp(10000,1/2)
mean(y)
sqrt(var(y))
One way to "link" probabilities and expectations is via the indicator function IA defined as
because with this we have for a continuous r.v. X with density f:
Lemma say we have a nonnegative rv X, that is P(X≥0)=1. Then P(X=0)=1 iff EX=0
say P(X=0)=1, then X is a discrete rv with pmf f(0)=1 and so EX=0·1=0
say EX=0. Assume P(X=0)<1, then there exists d>0 and e>0 such that P(X>d)>e. Then

in either case we have a contradiction with EX=0.
Example Let (X,Y) be a discrete random vector with f(x,y) = (1/2)x+y, x≥1, y≥1. Find E[XY2]

and the R check:
x=rgeom(100000,1/2)+1
y=rgeom(100000,1/2)+1
mean(x*y^2)
Note cov(X,X) = V(X)
As with the variance we have a simpler formula for actual calculations: cov(X,Y) = E(XY) - (EX)(EY)
Example : take the Example of the sum and absolute value of the difference of two rolls of a die. What is the covariance of X and Y?
So we have
mX = EX = 2*1/36 + 3*2/36 + ... + 12*1/36 = 7.0
mY = EY = 0*6/36 + 1*12/36 + ... + 5*2/36 = 70/36
EXY = 0*2*1/36 + 1*2*0/36 + .2*2*0/36.. + 5*12*0/36 = 490/36
and so cov(X,Y) = EXY-EXEY = 490/36 - 7.0*70/36 = 0
Obviously, if cov(X,Y)=0, then rXY=cor(X,Y)=cov(X,Y)/(sXsY)=0 as well
Let's do R checking:
d1=sample(1:6,10000,replace=T)
d2=sample(1:6,10000,replace=T)
x=d1+d2
y=abs(d1-d2)
mean(x)
mean(y)
mean(x*y)
mean(x*y)-mean(x)*mean(y)
or just
cov(x,y)
Note that we previously saw that X and Y are not independent, so we here have an Example that a covariance of 0 does not imply independence! It does work the other way around, though:
Theorem: If X and Y are independent, then cov(X,Y) = 0
proof (in the case of X and Y continuous):
and so cov(X,Y) = EXY-EXEY = EXEY - EXEY = 0
We saw above that E(X+Y) = EX + EY. How about V(X+Y)?
and if X
Y we have V(X+Y) = VX + VY
Example Consider again the example from before: we have continuous rv's X and Y with joint density f(x,y)=8xy, 0≤x<y≤1. Find the covariance and the correlation of X and Y.
cov(X,Y)=E[XY]-E[X]E[Y]. We have seen before that fY(y)=4y3, 0<y<1, so
E[Y]=∫-∞∞yfY(y)dy = ∫01y4y3dy = 4/5y5|01 = 4/5
Now

and

and so cov(X,Y)=4/9-8/15·4/5 = 12/675
Also

How about an R check? To do one we need data from this distribution. Consider first fY(y) = 4y3, 0<y<1. This turns out to be a special case of the beta distribution, which has density f(t)=cta-1(1-t)b-1, 0<t<1. We have a=4, b=1. The R command to generate observations is rbeta(n,a,b)
Next consider the conditional distribution X|Y=y: fX|Y=y(x|y)=2x/y2, 0<x<y. Now X' generated with rbeta(2,1)
has density f(x)=2x, 0<x<1, and we can use the following transformation: let Z=yX', then
P(Z≤z) = P(yX'≤z) = P(X'≤z/y) = ∫0z/y2tdt = ∫0z2u/y2du
by change of variables. Also 0<x<1 implies 0<yx<y, and we have shown that Z=X|Y=y. Now we can generate our observations as follows:
y=rbeta(n,4,1)
z=rbeta(n,2,1)
x=z*y
and check mean(x)=8/15, mean(y)=4/5, cov(x,y)=12/675, cor(X,Y)=0.492
Example say (X,Y) is a discrete rv with joint pmf f given by

where a,b,c and d are numbers such that f is a pmf, that a,b,c,d>0 and a+b+c+d=1
Now the marginales of X and Y are given by
fX(0)=a+b, fX(1)=c+d
fY(0)=a+c, fY(1)=b+d
so
EX=c+d and EY=b+d
also EXY=d, and so
cov(X,Y) = d-(c+d)(b+d) = d-bc(c+b)d -d2 = d-bc-(1-a-d)d-d2 = d-bc-d+ad+d2-d2 = ad-bc
so X and Y are uncorrelated iff ad-bc=0
Of course

When are X and Y independent? For that we need f(x,y)=fX(x)fY(y) for all x and y, so we need
a=(a+b)(a+c)
b=(a+b)(b+d)
c=(a+b)(b+d)
d=(c+d)(b+d)
but
a = (a+b)(a+c) = a2+(c+b)a+bc = a2+(1-a-d)a+bc = a-ad+bc, or ad-bc=0!
Similarly we find that each of the other three equations holds iff ad-bc=0. So X
Y iff ad-bc=0, and here we have a case where X
Y iff cov(X,Y)=0.
Notice that if X
Y then rX+s
Y for any r,s with r≠0, so the above does not depend on the fact that X and Y take values 0 and 1, although the proof is much easier this way.
If you know cov(X,Y)=2.37, what does this tell you? Not much, really, except X and Y are not independent. But if I tell you cor(X,Y)=0.89, that tells us more:
Theorem
1) |rXY=|≤1
2) rXY=±1 iff there exist a≠0 and b such that P(X=aY+b)=1
Proof
1) Consider the function h(t) = E((X-mX)t+(Y-mY))2. Now h(t) is the expectation of a non-negative function, so h(t)≥0 for all t. Also

because the quadratic function has at most one real root and so the discriminant has to be less or equal to 0.

2) Continuing with the argument above we see that |rXY|=1 iff D=0, that is if h(t) has a single root. But [(X-mX)t+(Y-mY)]2≥0 for all t, and we have
h(t)=0 iff P([(X-mX)t+(Y-mY)]2=0)=1
This is the same as
P((X-mX))t+(Y-mY)=0)=1
so
P(X=aY+b)=1 with a=-t and b=mXt+mY, where t is the single root of h(t)
A little bit of care with covariance and correlation: they are designed to measure linear relationships. Consider the following:
Example let X~U[-1,1], and let Y=X2. Then EX=0 and EY = EX2-(EX)2=1/3. Also E[XY]=E[X3]=0, so cov(X,Y)=0-0·1/3 = 0.
So here is a case of two uncorrelated rv's, but if we know X we know exactly what Y is! Correlation is only a sensible measure of linear relationships, not any others.
So as we said above, if you know cov(X,Y)=2.37, that does not tell you much. But if you know cor(X,Y)=0.89 and if there is a linear relationship between X and Y, we know that it is a strong positive one.
A nice property of the correlation is that it is scale-invariant:
Let a≠0 and b be any numbers, then cor(aX+b,Y)=cor(X,Y):
so for example the correlation between the ocean temperature and the windspeed of a hurricane is the same whether the temperature is measured in Fahrenheit or Centigrade.
Example Say (X,Y) is a discrete rv with joint pmf f(x,y)=(1-p)2px, x,y
{0,1,..}, y≤x, and 0<p<1. Find E[Y|X=x]
first we need fY|X=x(y|x), and for that we need fX(x):

so fY|X=x(y|x)=f(x,y)/fX(x)=(1-p)2px/((1-p)2(x+1)px)=1/(x+1), so Y|X=x has a discrete uniform distribution on {0,1,..,x}. Therefore
Example Consider again the example from before: we have continuous rv's X and Y with joint density f(x,y)=8xy, 0≤x<y≤1. We have found fY(y) = 4y3, 0<y<1, and fX|Y=y(x|y) = 2x/y2, 0≤x≤y. So

Throughout this calculation we treated y as a constant. Now, though, we can change our point of view and consider E[X|Y=y] = 2y/3 as a function of y:
g(y)=E[X|Y=y]=2y/3
What are the values of y? Well, they are the observations we might get from the rv. Y, so we can also write
g(Y)=E[X|Y=Y]=2Y/3
but Y is a rv, then so is 2Y/3, and we see that we can define a rv Z=g(Y)=E[X|Y]
Recall that the expression fX|Y does not make sense. Now we see that on the other hand the expression E[X|Y] makes perfectly good sense!
Example: An urn contains 2 white and 3 black balls. We pick two balls from the urn. Let X be denote the number of white balls chosen. An additional ball is drawn from the remaining three. Let Y equal 1 if the ball is white and 0 otherwise.
For example f(0,0) = P(X=0,Y=0) = 3/5*2/4*1/3 = 1/10. (choose black-black-black)
The complete pmf is given by:
| Y\X | 0 | 1 | 2 |
| 0 | 1/10 | 2/5 | 1/10 |
| 1 | 1/5 | 1/5 | 0 |
Now for the marginals we have, for example fX(0)=1/10+1/5=3/10, or in general:
| x | 0 | 1 | 2 |
| P(X=x) | 3/10 | 3/5 | 1/10 |
| y | 0 | 1 |
| P(Y=y) | 3/5 | 2/5 |
fX|Y=0(0|0) = f(0,0)/fY(0) = (1/10)/(3/5) = 1/6, and in general the conditional distribution of X|Y=0 is
| x | 0 | 1 | 2 |
| P(X=x|Y=0) | 1/6 | 2/3 | 1/6 |
The conditional distribution of X|Y=1 is
| x | 0 | 1 | 2 |
| P(X=x|Y=1) | 1/2 | 1/2 | 0 |
Finally the conditional r.v. Z = E[X|Y] has pmf
| z | 1 | 1/2 |
| P(Z=z) | 3/5 | 2/5 |
How about using simulation to do these calculations? - program urn1
There is a very useful formula for the expectation of conditional r.v.s: E[E[X|Y]] = E[X]
E[X] = 0·3/10 + 1·3/5 + 2·1/10 = 4/5
There is a simple explanation for this seemingly complicated formula!
Here is a corresponding formula for the variance:
V(X) = E[V(X|Y)] + V[E(X|Y)]
Example: let's say we have a continuous bivariate random vector with the joint pdf f(x,y) = c(x+2y) if 0<x<2 and 0<y<1, 0 otherwise.
Find C
Find the marginal distribution of X
Find the marginal distribution of Y
Find the conditional pdf of Y|X=x
Note: this is a proper pdf for any fixed value of x
Find E[Y|X=x]
Let Z=E[Y|X]. Find E[Z]
Example Let X~Exp(l), find f.

The name comes from the following theorem
Theorem
Say f(t) is the mgf of a rv X. Say there exists an e>0 such that |f(t)|<∞ for all t in (-e,e). Then
fk(0) = EXk for all k.
Example For the exponential rv we have

Warning nobody uses the moment generating function to generates moments! It has other uses:
Theorem
let X1,..., Xn be a sequence of independent rv.s with mgf's fi, and let Z=∑Xi, then
fZ(t)=∏fi(t)
if the distributions of the Xi are the same as well, then fi=fX for all i and
fZ=(f(t))n
proof

here is a very deep theorem, without proof:
Theorem
let X and Y be rv.s with mgf's fX and fY, respectively. If both mgf's are finite in an open neighborhood of 0 and if fX(t) = fY(t) for all t in this neighborhood, then FX(u)=FY(u) for all u.
Example show that the sum of two independent exponential rv. is not an exponential rv.
say X~Exp(l) and Y~Exp(r), then fX(t)=l/(l-t) and fX(t)=r/(r-t), so fX+Y(t)=l/(l-t)·r/(r-t)≠a/(a-t) for any a.
Example Consider the two pdfs given by

(f1 is called a log-normal distribution)
Now it turns out that if X1 has density f1, then

where we use the change of variables t=log(x)
but

(use change of variables t=log(x)-r)
and so here is an example that shows that the condition of the theorem above is also necessary, without it you can have two rv's with all their moments equal but different distributions.