We will use the following notation: X has pmf (pdf) f(x|θ) indicates that the pmf (pdf) depends on a parameter θ (which could be a vector). For example θ=(μ,σ) for the normal.
Say we have X = (X1, ..,Xn) with pmf(pdf) f(x|θ). Then any function of the data T(X) =T(X1, ..,Xn) is called a statistic. If it is meant to estimate θ it is called an estimator of θ.
All these properties are equally important for Bayesians and Frequentists.
Example : say X1, ..,Xn are iid U[0,θ]. Find an unbiased estimator of θ.
We will consider two possible estimators, one based on the sample mean and another based on the maximum:
The meaning of "sufficient statistic" is that all the information about the parameter θ is contained in T, so any inference about θ (such as an estimator, a hypothesis test or a confidence interval) can be based on T.
Theorem: If f(x|θ) is the joint pdf (pmf) of X and q(t|θ) is the pdf (pmf) of T(X), then T(X) is a sufficient statistic for θ if, for every x in the sample space the ratio f(x|θ)/q(t|θ) is constant as a function of θ
Example : say X1, ..,Xn are iid Ber(p) (Here θ=p). Let T(X) = X1+ ..+Xn. T is the number of "successes" in n independent Bernoulli trials, and so T~Bin(n,p). So
We see that the ratio is a constant with respect to p and so T is a sufficient statistic for p
Factorization Theorem: Let f(x|θ) be the joint pdf (pmf) of X. A statistic T(X) is a sufficient statistic for θ if and only if there exist functions g(t|θ) and h(x) such that for every x in the sample space and all values of the parameter we have f(x|θ) = g(T(x)|θ)h(x)
Example : say X1, ..,Xn are iid N(μ,σ) and we assume that σ is known, so θ=μ. Then
so we see that the sample mean is a suffcient statistic for the population mean μ, at least if the variance is known.
Example : say X1, ..,Xn are U[θ,θ+1] and let R be the range of the observations, that is R = max{X} -min{X}. It can be shown that R ~ Beta(n-1,2) for all θ.
Example : say X1, ..,Xn are N(μ,σ) and let s be the sample standard deviation, then (n-1)s2/σ2~Χ2(n-1)
Example By the WLLN if μ=EX exists the sample mean is a consistent estimator of μ.
Example say X1, .., Xn iid U[0,θ] and T(X)=(n+1)/n·max{X1, .., Xn}. Then
Example let's look again at the example above: we have X1, ..,Xn iid U[0,θ]. We found that T1 = 2
and T2 = (n+1)/n·max{X1, ..,Xn} are unbiased estimators of θ. Now
so we find that T2 is more efficient than T1 for every value of θ.
This is illustrated in effsim1
Generally it is quite possible that one estimator is more more efficient than another only for a subset of the parameter space, and the other one is more efficient on the rest. Also, one estimator might be more efficient if n is small but things are reversed if n is large.
An interesting question is whether in a given problem there is an estimator that is more efficient than any other (unbiased) estimator. At least in the sense of minimum variance, such an estimator would be optimal. In order to answer this question we need the following:
Theorem (Rao-Cramer)
Let X1, ..,Xn be a sample from pmf(pdf) f(x|θ), and let T be any estimator satisfying
Note that if T is an unbiased estimator of θ, we have ET(X)=θ and the numerator is just 1.
The right hand side of this inequality is called the Rao-Cramer Lower Bound
The quantity in the denominator (without the n) is called the Fisher information number of the sample.
Under some mild conditions we have
Example Again let's look at the example above. There we have f(x|θ) = 1/θ, 0 < x < θ. So
so it appears that the Rao-Cramer theorem says that for any unbiased estimator T we have V(T)≥θ2/n
but we have already seen that V(T2) = θ2/(n(n+2)) < θ2/n
In fact it can be shown that here the first assumption of the theorem is not satisfied, something that happens quite often, especially if the parameter is part of the boundary condition , such as 0 < x < θ
Example Let X1, ..,Xn be a sample from N(μ,θ) and consider estimating the variance θ, where μ is unknown. The normal pdf satisfies the conditions of the theorem, and so we have
and so any unbiased estimator T of θ (or σ2) must satisfy V(T)≥2θ2/n = 2σ4/n
Note that V(S2) = 2σ4/(n-1), so the sample variance does not attain the Rao-Cramer lower bound (althought for any sizable n it's pretty close)
Example Let X1, ..,Xn be a sample from P(λ)
Now
and so the sample mean is the minimum variance unbiased estimator for λ.
Example : Let X1, .., Xn be iid N(μ,σ). It is known that the sample mean is the best estimator of μ in the sense that it has the smallest variance of all unbiased estimators. But what happens if our assumption of the normal distribution is wrong?
Let's consider instead a model called the d-contamination model:
for some other density f. Suppose first we let f be any density with mean θ and variance t2. Then
Now if f is a Cauchy pdf we have t=∞ and so the variance of the sample mean is infinite as well!
One way to measure the robustness of an estimator is as follows: Let X(1), .., X(n) be the order statistic and let Tn be a statistic based on this sample. Tn has a breakdown value b, if, for every e>0
Example : Say Tn is the sample mean. Now Tn = (X1+ ..+ Xn)/n = (X(1)+ ..+ Xn)/n → ∞ if X(n) = X({(1-e)n}) → ∞, and so the sample mean has a breakdown value of 0.
Example : Say Tn is the sample median. Now Tn has a breakdown value of 1/2.