
Chebyshev's Inequality:
If X is a r.v. with mean μ and variance σ2, then for any k>0:

proof:
Example Consider the uniform random variable with f(x) = 1 if 0<x<1, 0 otherwise. We already know that μ=0.5 and σ=1/√12 = 0.2887. Now Chebyshev says
P(|X-0.5|>k·0.2887)≤1/k2
For example
P(|X-0.5|>1·0.2887)≤1/12 = 1 (rather boring!)
or
P(|X-0.5|>3·0.2887)≤1/32 = 1/9
actually P(|X-0.5|>0.866) = 0, so this is not a very good upper bound.
e>0
proof (assuming in addition that V(Xi)=σ2<∞)
We apply Chebyshev's inequality to Zn:
This theorem forms the bases of (almost) all simulation studies: say we want to find a parameter θ of a population. We can generate data from a random variable X with pdf (pmf) f(x|θ) such that Eh(X) = θ. Then by the law of large numbers
Example : in a game a player rolls 5 fair dice. He then moves his game piece along k fields on a board, where k is the smallest number on the dice + largest number on the dice. For example if his dice show 2, 2, 3, 5, 5 he moves 2+5 = 7 fields. What is the mean number of fields θ a player will move?
To do this analytically would be quite an excercise. To do it via simulation is easy:
Let X be an independent random vector of length 5, with X[j]
1,..,6 and P(X[j]=k)=1/6
let h(x) = min(x)+max(x), then Eh(X) = θ
Let X1, X2, .. be iid copies of X, then by the law of large numbers
The simulation is implemented in exminmax .
We first need the definition of a normal (or Gaussian) r.v.:
A random variable X is said to be normally distributed with mean μ and variance σ2 if it has density:
If μ=0 and σ=1 we say X has a standard normal distribution.
We use the symbol Φ for the distribution function of a standard normal r.v., so
Let X1, X2, .. be an iid sequence of r.v.'s with mean μ and standard deviation σ. Then
where
is the sample mean of the first n observations.
Note that
so the scaling in the clt is just right to match the standard normal r.v.
Let's do a simulation to illustrate the CLT: we will use the most basic r.v. of all, called a Bernoulli r.v. which has P(X=0)=1-p and P(X=1)=p (Think indicator function for the coin toss}. So we sample n Bernoulli r.v. with "success paramater p" and find their sample mean. Note that
The simulation is done in the routine cltexample 1
Recall: If a function h(x) has derivatives of order r, that is if g(r)(x) exists, then for any constant a the Taylor polynomial of order r is defined by
One of the most famous theorems in mathematics called Taylor's theorem states that the remainder of the approximation h(x)-Tr(x) goes to 0 faster than the highest order term:
Taylor's theorem
There are various formulas for the remainder term, but we won't need them here.
Example : say h(x) = log(x+1) and we want to approximate h at x=0. Then we have
The approximation is illustrated in taylor
For our purposes we will need only first-order approximations (that is using the first derivative) but we will need a multivariate extension as follows: say X1, ..,Xn are r.v. with means μ1, .. ,μn and define X=(X1, ..,Xn) and μ=(μ1, .. ,μn). Suppose there is a differentiable function h(X) for which we want an approximate estimate of the variance. Define
The first order Taylor expansion of h about μ is
Forgetting about the remainder we have
and
Example : say we have a sample X1, ..,Xn from a Bernoulli r.v. with success parameter p. One popular measure of the probability of winning a game is the odds p/(1-p). For example when you roll a fair die the odds of getting a six are (1/6)/(1-(1/6) = 1:5.
An obvious estimator for p is
, the sample mean, or here the proportion of "successes" in the n trials. Then an obvious estimator for the odds is
/(1-
). The question is, what is the variance of this estimator?
Using the above approximation we get the following: let h(p)=p/(1-p), so h'(p)=1/(1-p)2 and
The routine varapp1 illustrates how good an approximation this is.
Example : We have a rv X~U[0,1], and a rv Y~U[0,X]. Find an approximation of V[Y/(1+Y)]
Note: this is called a hierarchical model.
We have:
1) fX(x)=1 if 0<x<1, 0 otherwise
2) fY}X=x(y|x)=1/x if 0<y<x, 0 otherwise
Now
Example : let's consider the random vector with joint pdf f(x,y) = 6x, 0 < x< y < 1. Say we want to find V(X/Y). Then if we consider the function h(x,y) = x/y we have

Now we need to find μX=E[X], V[X], μY=E[Y], V[Y] and cov(X,Y):

