Recall: If a function h(x) has derivatives of order r, that is if g(r)(x) exists, then for any constant a the Taylor polynomial of order r is defined by
One of the most famous theorems in mathematics called Taylor's theorem states that the remainder of the approximation h(x)-Tr(x) goes to 0 faster than the highest order term:
Taylor's theorem
There are various formulas for the remainder term, but we won't need them here.
Example : say h(x) = log(x+1) and we want to approximate h at x=0. Then we have
The approximation is illustrated in taylor
One application of this is the
Delta Method
Let Yn be a sequence of rv's that satisfy √n(Yn-q)→N(0,s2) in distribution. For a given function g and a specific value of q, suppose that g'(q) exist and and is not 0. Then
√n[g(Yn)-g(q)]→N(0,s[g'(q)])
proof the taylor expansion of g(Yn) around Yn=q is
g(Yn)=g(q)+g'(q)(Yn-q)+R
where R→0 as Yn→q. Now
√n[g(Yn)-g(q) ] =
√n[g(q)+g'(q)(Yn-q)+R-g(q) ] =
g'(q)√n(Yn-q)+√nR → g'(q)X
where X~N(0,s)
Example say we have a sequence of iid rv's X1, ..,Xn, each with mean m≠0 and standard deviations. We know √n(
-m)→N(0,s). Now let g(x)=1/x, then g'(x)=-1/x2 and we get
√n(1/
-1/m)→N(0,s/m2)
say for example Xi~U[0,1], then m=1/2 and s=1/√12, so according to the delta method
√n(1/
-1/m) ~ N(0,s/m2) = N(0,2/√3).
The approximation is shown in deltaex1
For our purposes we will need only first-order approximations (that is using the first derivative) but we will need a multivariate extension as follows: say X1, ..,Xn are r.v. with means m1, .. ,mn and define X=(X1, ..,Xn) and m=(m1, .. ,mn). Suppose there is a differentiable function h(X) for which we want an approximate estimate of the variance. Define
The first order Taylor expansion of h about m is
Forgetting about the remainder we have
and

Example Say we have just one rv X, then the formula simplyfies to

say X~N(m,1) with m large enough so that P(X>0)=1. We want to find V[log(X)]. set h(x)=log(x), then h'(x)=1/x and

check with var(rnorm(1000,10))
Example Say we have two rv's X and Y and X
Y, then the formula simplyfies to

say Xand Y have a geometric distribution with parameters p and r, respectively. We want to approximate the variance of √(X2+Y2)
Now mX=1/p, V[X]=(1-p)/p2, mY=1/r, V[Y]=(1-r)/r2
let h(x,y)=√(x2+y2), then

check varapp2 to see how good the approximation is.
Example : say we have a sample X1, ..,Xn from a Bernoulli r.v. with success parameter p, that is P(X=1)=p=1-P(X=0). One popular measure of the probability of winning a game is the odds p/(1-p). For example when you roll a fair die the odds of getting a six are (1/6)/(1-(1/6) = 1:5.
An obvious estimator for p is
, the sample mean, or here the proportion of "successes" in the n trials. Then an obvious estimator for the odds is
/(1-
). The question is, what is the variance of this estimator?
First note that V[
] = V[1/n∑Xi] = 1/n2 ∑V[Xi] = 1/nV[X1]
EX=0·(1-p)+1·p=p
EX2=02·(1-p)+12·p=p, so
V[X]=p-p2=p(1-p) and so
V[
]=p(1-p)/n
Using the above approximation we get the following: let h(p)=p/(1-p), so h'(p)=1/(1-p)2 and
The routine varapp1 illustrates how good an approximation this is.
Example : let's consider the random vector with joint pdf f(x,y) = 1, 0<x,y< 1. Say we want to find V(X/Y).
Of course X,Y~U[0,1] and X
Y, so
E[X]=E[Y]=1/2, V[X]=V[Y]=1/12 and Cov(X,Y)=0.
Then if we consider the function h(x,y) = x/y we have

How good is this approximation? Running var(runif(10000)/runif(10000)) shows that it is actually very bad! The reason is that occasionally the denominator is very small, so the ratio is very big.
Let's change the problem a little: now f(x,y) = 1, 1<x,y<2, that is X,Y~U[1,2] and X
Y, so
E[X]=E[Y]=3/2, V[X]=V[Y]=1/12 and Cov(X,Y)=0 and then

and this is actually quite good, check with var(runif(10000,1,2)/runif(10000,1,2)). Generally ratios are often trouble!
Example : let's consider the random vector with joint pdf f(x,y) = 6x, 0<x<y< 1. Say we want to find V(X/Y).
First we have

so

and this is quite good, check with
x=rbeta(10000,2,2)
y=runif(10000,x,1)
var(x/y)
Example say we have a rv X geometric with p=0.5. We want to find P(log(X!)>50).
Let's try to solve this problem analytically. First, log(x!) is an increasing function of x, so there exists x50 such that log(x!)>50 iff x>x50, so that P(log(X!)>50)=P(X≥x50). Finding x50 analytically is hopeless, though. We can do it with R by trial and error, using log(factorial(n)) for different values of n. We find n=22.5, so

or about 2.38·10-7
How about an R check? The problem with this is that the probabiility p we want to find is very small, so in a simple simulation as shown in approx_ex1(1) we can expect the outcome of interest only about every 1 in 4.2million runs. In order to get some resonably good estimate we probably need to run the simulation with n=109.
Here is an idea: the problem is that our event of interest, log(X!)>50, is very rare, it almost never happens. Let's instead sample from a distribution Y which has large values much more often, so that log(Y!)>50 happens more often. For example, let's try Y geometric with p=0.05, see approx_ex1(2). It seems P(log(Y!)>50)=0.35, but there is a problem, we get a warning from R: value out of range in 'gammafn'. The reason is that R calculates log(y!) by first calculating y! and then taking log, and it finds y! via the gamma function, but for y bigger than about 170 y! can no longer be found that way. But log(175!)=732.33, not so big at all. We need a different way to calculate log(y!). Here are two ideas:
• log(y!) = log(∏k=1yk)=∑k=1ylog(k)
so we could use the R command sum(log(1:y))
• in logfac(y) I calculate log(y!) directly for y from 0 to 9, and then use
log(y!)=0.918938533205+(y+0.5)*log(y)-y+(1/12-1/(360*y2))/y;
this is based on Stirlings' Approimation:

Both of these work, but the second is much faster, so we will use it.
So P(log(Y!)>50)=0.35. But what good is that? I want X! Well:

so if we sample from Y and find the sum here we still get an estimate of the probability for X. This is shown in approx_ex1(3)
In general we have the following: Let X be a rv' with pmf (pdf) f and and Y a rv' with pmf (pdf) g. Say we want to find E[h(X)]. Then

Note this was done for discrete rv's but it works just as well for continuous ones.
Note how to choose Y? Obviously we need Y such that it can't happen that P(Y=x)>0 and P(X=x)=0. In general we should choose a Y with the same "support" as X, that is P(X=x)>0 iff P(Y=x)>0. It is not necessary to have a Y that "looks like" X. For example in the case above we could have choosen Y with pmf fY(x)=6/(p2x2). See approx_ex1(4). It is also a good idea to choose Y such that the event of interest (log(Y!)>50) happens about 50% of the time.
Example say X, Y and Z have a standard normal distribution. Find P(|XYZ|>K), for example K=10
Now there is no way to do this analytically, and again the probability is very small. So we will use IS with X', Y' and Z' generated from normal distributions with mean 0 and standard deviation s. For our case of K=10 s=3 works good. In general, for some K play around a bit to find a good s. See approx_ex2