Expectations

The expectation (or expected value) of a random variable g(X) is defined by

We use the notation Eg(X)

Example : we roll fair die until the first time we get a six. What is the expected number of rolls?
We saw that f(x) = 1/6*(5/6)x-1 if x{1,2,..}. Here we just have g(x)=x, so

How do we compute this sum? Here is a "standard" trick:

and so we find

This is a special example of a geometric rv, that is a discrete rv X with pmf f(x)=p(1-p)x-1, x=1,2,.. Note that if we replace 1/6 above with p, we can show that

Example Say (X,Y) is a discrete rv with joint pmf f(x,y)=cpx, x,y{0,1,..}, y≤x, and 0<p<1. Find c
we already did that before by summing first over y and then over x. We can use the above for an even simpler proof:

where G is a geometric rv with rate 1-p

X is said to have a uniform [A,B] distribution if f(x)=1/(B-A) for A<x<B, 0 otherwise.
Find EXk (this is called the kth moment of X).

some special expectations are the mean of X defined by μ=EX and the variance defined by σ2=V(X)=E(X-μ)2. Related to the variance is the standard deviation σ, the square root of the variance.

Here are some formulas for expectations:

the last one is a useful formula for finding the variance and/or the standard deviation. Here is its proof:
V(X) = E[(X-μ)2] = E[X2-2Xμ+μ2] =EX2-2μEX+μ2=EX22

Example : find the mean and the standard deviation of a uniform [A,B] r.v.

Example Find the mean and the standard deviaiton of an exponential rv with rate λ.

Let's use R to check. remember R uses 1/λ where we use λ

Say λ=2
y=rexp(10000,1/2)
mean(y)
sd(y)

One way to "link" probabilities and expectations is via the indicator function IA defined as

because with this we have for a continuous r.v. X with density f:

Lemma say we have a nonnegative rv X, that is P(X≥0)=1. Then P(X=0)=1 iff EX=0

say P(X=0)=1, then X is a discrete rv with pmf f(0)=1 and so EX=0·1=0

say EX=0. Assume P(X=0)<1, therefore P(X>0)=1-P(X=0)>1-1=0, so there exists δ>0 and ε>0 such that P(X>δ)>ε. Then

in either case we have a contradiction with EX=0.

Expectations of Random Vectors

The definition of expectation easily generalizes to random vectors:

Example Let (X,Y) be a discrete random vector with f(x,y) = (1/2)x+y, x≥1, y≥1. Find E[XY2]

First we have

because this is the mean of a geometric rv with p=1/2. Next

Note that if we replace 1/2 with p we have just shown that E[X]=1/p and V[X]=(1-p)/p2 for X~Geom(p)

Covariance and Correlation

The covariance of two r.v. X and Y is defined by cov(X,Y)=E[(X-μX)(Y-μY)]
The correlation of X and Y is defined by ρXY=cor(X,Y)=cov(X,Y)/(σXσY)

Note cov(X,X) = V(X)

As with the variance we have a simpler formula for actual calculations: cov(X,Y) = E(XY) - (EX)(EY)

Example : take the example of the sum and absolute value of the difference of two rolls of a die. What is the covariance of X and Y? So we have
μX = EX = 2*1/36 + 3*2/36 + ... + 12*1/36 = 7.0
μY = EY = 0*6/36 + 1*12/36 + ... + 5*2/36 = 70/36
EXY = 0*2*1/36 + 1*2*0/36 + .2*2*0/36.. + 5*12*0/36 = 490/36
and so cov(X,Y) = EXY-EXEY = 490/36 - 7.0*70/36 = 0

Obviously, if cov(X,Y)=0, then ρXY=cor(X,Y)=cov(X,Y)/(σXσY)=0 as well

Let's do R checking:

d1=sample(1:6,10000,replace=T)
d2=sample(1:6,10000,replace=T)
x=d1+d2
y=abs(d1-d2)
mean(x)
mean(y)
mean(x*y)
mean(x*y)-mean(x)*mean(y)
or just
cov(x,y)

Note that we previously saw that X and Y are not independent, so we here have an example that a covariance of 0 does not imply independence! It does work the other way around, though:

Theorem: If X and Y are independent, then cov(X,Y) = 0

proof (in the case of X and Y continuous):

We saw above that E(X+Y) = EX + EY. How about V(X+Y)?

and if XY we have V(X+Y) = VX + VY

Example Consider again the example from before: we have continuous rv's X and Y with joint density f(x,y)=8xy, 0≤x<y≤1. Find the covariance and the correlation of X and Y.

cov(X,Y)=E[XY]-E[X]E[Y]. We have seen before that fY(y)=4y3, 0<y<1, so

E[Y]=∫-∞yfY(y)dy = ∫01y4y3dy = 4/5y5|01 = 4/5

Now

and

and so cov(X,Y)=4/9-8/15·4/5 = 12/675

Also

Example say (X,Y) is a discrete rv with joint pmf f given by

where a,b,c and d are numbers such that f is a pmf, that a,b,c,d>0 and a+b+c+d=1. Note that this is the most general case of a discrete random vector where X and Y just take two values. What can be said in this generality?

Now the marginals of X and Y are given by

fX(0)=a+b, fX(1)=c+d
fY(0)=a+c, fY(1)=b+d

so

EX = 0·(a+b)+1·(c+d) = c+d
and
EY = 0·(a+c)+1·(b+d) = b+d

also EXY = 0·0·a + 1·0·b + 0·1·c + 1·1·d = d, and so

cov(X,Y) = d-(c+d)(b+d) = d-cb-cd-bd-d2 = d-bc-(c+b)d-d2 = d-bc-(1-a-d)d-d2 = d-bc-d+ad+d2-d2 = ad-bc

so X and Y are uncorrelated iff ad-bc=0

Of course

When are X and Y independent? For that we need f(x,y)=fX(x)fY(y) for all x and y, so we need

a=(a+b)(a+c)
b=(a+b)(b+d)
c=(a+b)(b+d)
d=(c+d)(b+d)

but
a = (a+b)(a+c) = a2+(c+b)a+bc = a2+(1-a-d)a+bc = a-ad+bc, or ad-bc=0!
Similarly we find that each of the other three equations holds iff ad-bc=0. So XY iff ad-bc=0, and here we have a case where XY iff cov(X,Y)=0.

Notice that if XY then rX+sY for any r,s with r≠0, so the above does not depend on the fact that X and Y take values 0 and 1, although the proof is much easier this way.

If you know cov(X,Y)=2.37, what does this tell you? Not much, really, except X and Y are not independent. But if I tell you cor(X,Y)=0.89, that tells us more:

Theorem
1) |ρXY|≤1
2) ρXY=±1 iff there exist a≠0 and b such that P(X=aY+b)=1

Proof
1) Consider the function h(t) = E[(X-μX)t+(Y-μY)]2. Now h(t) is the expectation of a non-negative function, so h(t)≥0 for all t. Also

because the quadratic function h(t)≥0, so it has at most one real root and so the discriminant has to be less or equal to 0.

2) Continuing with the argument above we see that |ρXY|=1 iff D=0, that is if h(t) has a single root. But [(X-μX)t+(Y-μY)]2≥0 for all t, and we have
h(t)=0 iff P([(X-μX)t+(Y-μY)]2=0)=1
This is the same as
P((X-μX))t+(Y-μY)=0)=1
so
P(X=aY+b)=1 with a=-t and b=μXt+μY, where t is the single root of h(t)

A little bit of care with covariance and correlation: they are designed to measure linear relationships. Consider the following:

Example let X~U[-1,1], and let Y=X2. Then EX=0 and EY = EX2 = VX+(EX)2 = VX = (1-(-1))2/12 = 4/12 = 1/3.

Also
E[XY] = E[X3] = (14-(-1)4)/4/(1-(-1)) = 0

so cov(X,Y)=0-0·1/3 = 0.

So here is a case of two uncorrelated rv's, but if we know X we know exactly what Y is! Correlation is only a sensible measure of linear relationships, not any others. So as we said above, if you know cov(X,Y)=2.37, that does not tell you much. But if you know cor(X,Y)=0.89 and if there is a linear relationship between X and Y, we know that it is a strong positive one.

A nice property of the correlation is that it is scale-invariant:
Let a≠0 and b be any numbers, then cor(aX+b,Y)=cor(X,Y):

so for example the correlation between the ocean temperature and the windspeed of a hurricane is the same whether the temperature is measured in Fahrenheit or Centigrade.

Conditional Expectation and Variance

Say X|Y=y is a conditional r.v. with pmf (pdf) f. Then the conditional expectation of g(X)|Y=y is defined by

Example Say (X,Y) is a discrete rv with joint pmf f(x,y)=(1-p)2px, x,y{0,1,..}, y≤x, and 0<p<1. Find E[Y|X=x]
first we need fY|X=x(y|x), and for that we need fX(x):

so fY|X=x(y|x)=f(x,y)/fX(x)=(1-p)2px/((1-p)2(x+1)px)=1/(x+1), so Y|X=x has a discrete uniform distribution on {0,1,..,x}. Therefore

Example Consider again the example from before: we have continuous rv's X and Y with joint density f(x,y)=8xy, 0≤x<y≤1. We have found fY(y) = 4y3, 0<y<1, and fX|Y=y(x|y) = 2x/y2, 0≤x≤y. So

Throughout this calculation we treated y as a constant. Now, though, we can change our point of view and consider E[X|Y=y] = 2y/3 as a function of y:

g(y)=E[X|Y=y]=2y/3

What are the values of y? Well, they are the observations we might get from the rv. Y, so we can also write

g(Y)=E[X|Y=Y]=2Y/3

but Y is a rv, then so is 2Y/3, and we see that we can define a rv Z=g(Y)=E[X|Y]

Recall that the expression fX|Y does not make sense. Now we see that on the other hand the expression E[X|Y] makes perfectly good sense!

Let's continue this example and find the conditional variance of X|Y=y:

and again we can consider the conditional variance of X|Y: V[X|Y]=Y2/18

Example: An urn contains 2 white and 3 black balls. We pick two balls from the urn. Let X be denote the number of white balls chosen. An additional ball is drawn from the remaining three. Let Y equal 1 if the ball is white and 0 otherwise.
For example f(0,0) = P(X=0,Y=0) = 3/5*2/4*1/3 = 1/10. (choose black-black-black)
The complete pmf is given by:
Y\X 0 1 2
0 1/10 2/5 1/10
1 1/5 1/5 0

Now for the marginals we have, for example fX(0)=1/10+1/5=3/10, or in general:
x 0 1 2
P(X=x) 3/10 3/5 1/10

and fY(0)=1/10+2/5+1/10=6/10, or
y 0 1
P(Y=y) 3/5 2/5

fX|Y=0(0|0) = f(0,0)/fY(0) = (1/10)/(3/5) = 1/6, and in general the conditional distribution of X|Y=0 is
x 0 1 2
P(X=x|Y=0) 1/6 2/3 1/6

and so E[X|Y=0] = 0·1/6+1·2/3+2·1/6 = 1.0

The conditional distribution of X|Y=1 is
x 0 1 2
P(X=x|Y=1) 1/2 1/2 0

and so E[X|Y=1] = 0·1/2+1·1/2+2·0 = 1/2

Finally the conditional r.v. Z = E[X|Y] has pmf
z 1 1/2
P(Z=z) 3/5 2/5

with this we can find E[Z] = E[E[X|Y]] = 1·3/5+1/2·2/5 = 4/5

There is a very useful formula for the expectation of conditional r.v.s: E[X] = E[E[X|Y]]
E[X] = 0·3/10 + 1·3/5 + 2·1/10 = 4/5

There is a simple explanation for this seemingly complicated formula!

Here is a corresponding formula for the variance:
V(X) = E[V(X|Y)] + V[E(X|Y)]

Example: let's say we have a continuous bivariate random vector with the joint pdf f(x,y) = c(x+2y) if 0<x<2 and 0<y<1, 0 otherwise.
Find C

Find the marginal distribution of X

Find the marginal distribution of Y

Find the conditional pdf of Y|X=x

Note: this is a proper pdf for any fixed value of x
Find E[Y|X=x]

Let Z=E[Y|X]. Find E[Z]

Moment Generating Functions

The moment generating function of a rv X is defined by Φ(t)=E[etX]

Example Let X~Exp(λ), find Φ.

The name comes from the following theorem

Theorem Say Φ(t) is the mgf of a rv X. Say there exists an ε>0 such that |Φ(t)|<∞ for all t in (-ε,ε). Then
Φk(0) = EXk for all k.

Example For the exponential rv we have

Warning nobody uses the moment generating function to generates moments! It has other uses:
Theorem let X1,..., Xn be a sequence of independent rv.s with mgf's Φi, and let Z=∑Xi, then
ΦZ(t)=∏Φi(t)
if the distributions of the Xi are the same as well, then ΦiX for all i and
ΦZ=(Φ(t))n
proof

here is a very deep theorem, without proof:
Theorem
let X and Y be rv.s with mgf's ΦX and ΦY, respectively. If both mgf's are finite in an open neighborhood of 0 and if ΦX(t) = ΦY(t) for all t in this neighborhood, then FX(u)=FY(u) for all u.

Example show that the sum of two independent exponential rv. is not an exponential rv.
say X~Exp(λ) and Y~Exp(ρ), then ΦX(t)=λ/(λ-t) and ΦX(t)=ρ/(ρ-t), so ΦX+Y(t)=λ/(λ-t)·ρ/(ρ-t)≠a/(a-t) for any a.

Example Consider the two pdfs given by

(f1 is called a log-normal distribution)

Now it turns out that if X1 has density f1, then

where we use the change of variables t=log(x)
but

(use change of variables t=log(x)-r)
and so here is an example that shows that the condition of the theorem above is also necessary, without it you can have two rv's with all their moments equal but different distributions.

Example

Let's have another look at the example of the "device" which generates a random number Y according to an exponential distribution with rate λ where λ=x with probability 0.5x where x=1,2,3,... We previously found that fY(y) = 2ey/(2ey-1)2, y≥0 Let's find E[X|Y]

Note E[Y|X] would be easy (=1/X because Y~Exp(X)), E[Y] would be a simple calculus problem ( ∫y2ey/(2ey-1)2dy ) and E[X] would be the easiest (=2 because X~Geom(1/2)), just E[X|Y=y] needs a little work:

we said above that E[X]=2. Let's check the formula E[X]=E[E[X|Y]]: