Properties of Estimators

We will use the following notation: X has pmf (pdf) f(x|θ) indicates that the pmf (pdf) depends on a parameter θ (which could be a vector). For example θ=(μ,σ) for the normal.
Say we have X = (X1, ..,Xn) with pmf(pdf) f(x|θ). Then any function of the data T(X) =T(X1, ..,Xn) is called a statistic. If it is meant to estimate θ it is called an estimator of θ.

All these properties are equally important for Bayesians and Frequentists.

Unbiased Estimators

An estimator T is called unbiased for θ if ET(X1, ..,Xn)=θ

Example : say X1, ..,Xn are iid U[0,θ]. Find an unbiased estimator of θ.
We will consider two possible estimators, one based on the sample mean and another based on the maximum:

Sufficient Statistics

A statistic T is a sufficient statistic for θ if the conditional distribution of the sample X given the value of T(X) does not depend on θ

The meaning of "sufficient statistic" is that all the information about the parameter θ is contained in T, so any inference about θ (such as an estimator, a hypothesis test or a confidence interval) can be based on T.

Theorem: If f(x|θ) is the joint pdf (pmf) of X and q(t|θ) is the pdf (pmf) of T(X), then T(X) is a sufficient statistic for θ if, for every x in the sample space the ratio f(x|θ)/q(t|θ) is constant as a function of θ

Example : say X1, ..,Xn are iid Ber(p) (Here θ=p). Let T(X) = X1+ ..+Xn. T is the number of "successes" in n independent Bernoulli trials, and so T~Bin(n,p). So

We see that the ratio is a constant with respect to p and so T is a sufficient statistic for p

Factorization Theorem: Let f(x|θ) be the joint pdf (pmf) of X. A statistic T(X) is a sufficient statistic for θ if and only if there exist functions g(t|θ) and h(x) such that for every x in the sample space and all values of the parameter we have f(x|θ) = g(T(x)|θ)h(x)

Example : say X1, ..,Xn are iid N(μ,σ) and we assume that σ is known, so θ=μ. Then

so we see that the sample mean is a suffcient statistic for the population mean μ, at least if the variance is known.

Ancillary Statistics

A statistic S(X) whose distribution does not depend on θ is called an ancillary statistic.

Example : say X1, ..,Xn are U[θ,θ+1] and let R be the range of the observations, that is R = max{X} -min{X}. It can be shown that R ~ Beta(n-1,2) for all θ.

Example : say X1, ..,Xn are N(μ,σ) and let s be the sample standard deviation, then (n-1)s222(n-1)

Consistency

A sequence of estimators Tn = Tn(X1, ..,Xn) is a consistent sequence of estimators for θ if, for every e>0 and every θ we have lim P(|Tn-θ|<e) → 1

Example By the WLLN if μ=EX exists the sample mean is a consistent estimator of μ.

Example say X1, .., Xn iid U[0,θ] and T(X)=(n+1)/n·max{X1, .., Xn}. Then

Efficiency

Say we have a sample X1, ..,Xn with pmf(pdf) f(x|θ), and we have two unbiased estimators T1 and T2 of θ. The efficiency of T1 relative to T2 is defined by eff(T1|T2) = V(T1)/V(T2) and we say that T1 is more efficient than T2 if eff(T1|T2)<1.

Example let's look again at the example above: we have X1, ..,Xn iid U[0,θ]. We found that T1 = 2 and T2 = (n+1)/n·max{X1, ..,Xn} are unbiased estimators of θ. Now

so we find that T2 is more efficient than T1 for every value of θ.
This is illustrated in effsim1

Generally it is quite possible that one estimator is more more efficient than another only for a subset of the parameter space, and the other one is more efficient on the rest. Also, one estimator might be more efficient if n is small but things are reversed if n is large.

An interesting question is whether in a given problem there is an estimator that is more efficient than any other (unbiased) estimator. At least in the sense of minimum variance, such an estimator would be optimal. In order to answer this question we need the following:

Theorem (Rao-Cramer)
Let X1, ..,Xn be a sample from pmf(pdf) f(x|θ), and let T be any estimator satisfying
est1fig5.png - 6047 Bytes

Note that if T is an unbiased estimator of θ, we have ET(X)=θ and the numerator is just 1.

The right hand side of this inequality is called the Rao-Cramer Lower Bound

The quantity in the denominator (without the n) is called the Fisher information number of the sample.

Under some mild conditions we have
est1fig7.png - 2765 Bytes

Example Again let's look at the example above. There we have f(x|θ) = 1/θ, 0 < x < θ. So
est1fig6.png - 3503 Bytes
so it appears that the Rao-Cramer theorem says that for any unbiased estimator T we have V(T)≥θ2/n
but we have already seen that V(T2) = θ2/(n(n+2)) < θ2/n
In fact it can be shown that here the first assumption of the theorem is not satisfied, something that happens quite often, especially if the parameter is part of the boundary condition , such as 0 < x < θ

Example Let X1, ..,Xn be a sample from N(μ,θ) and consider estimating the variance θ, where μ is unknown. The normal pdf satisfies the conditions of the theorem, and so we have
est1fig8.png - 10064 Bytes
and so any unbiased estimator T of θ (or σ2) must satisfy V(T)≥2θ2/n = 2σ4/n

Note that V(S2) = 2σ4/(n-1), so the sample variance does not attain the Rao-Cramer lower bound (althought for any sizable n it's pretty close)

Example Let X1, ..,Xn be a sample from P(λ)
Now

and so the sample mean is the minimum variance unbiased estimator for λ.

Robustness

In point estimation we first start by assuming a parametric model for the data, such as X1, .., Xn~N(μ,σ), and then try to estimate the parameters of the model. But what if our model is wrong, for example if the true model is a t distribution instead of the Normal? A robust estimator is one that does not depend to strongly on the assumed model.

Example : Let X1, .., Xn be iid N(μ,σ). It is known that the sample mean is the best estimator of μ in the sense that it has the smallest variance of all unbiased estimators. But what happens if our assumption of the normal distribution is wrong?
Let's consider instead a model called the d-contamination model:

for some other density f. Suppose first we let f be any density with mean θ and variance t2. Then

Now if f is a Cauchy pdf we have t=∞ and so the variance of the sample mean is infinite as well!

One way to measure the robustness of an estimator is as follows: Let X(1), .., X(n) be the order statistic and let Tn be a statistic based on this sample. Tn has a breakdown value b, if, for every e>0

Example : Say Tn is the sample mean. Now Tn = (X1+ ..+ Xn)/n = (X(1)+ ..+ Xn)/n → ∞ if X(n) = X({(1-e)n}) → ∞, and so the sample mean has a breakdown value of 0.

Example : Say Tn is the sample median. Now Tn has a breakdown value of 1/2.