Estimation

Point Estimation

Here is the type of problem we often see in Statistics:

Example We want to know the percentage p of students at the Colegio who like the food in the Cafeteria. p is the percentage for all students, that is for the whole population, so p is a parameter. For each student there are two possibilities: he/she likes the food or he/she does not like the food. So if we randomly select a student he/she is a Bernoulli trial. If we randomly selected n students then the number X of students who like the food at the Cafeteria would have a Binomial distribution, that is X~Bin(n,p). But we already know that

• if X~Bin(n,p) then m=np

If we did a good job when selecting the students for the sample, then we would hope that the sample mean X is close to the population mean, that is

X = m = np (roughly)

and so we can estimate the parameter p = X/n

What we have done is the following:

• we decided that the population can be described by a certain random variable, but without knowing the parameter
• formulas let us compute the parameter (m=np)
• Statistics lets us find the corresponding statistics (X)
• combine the two to estimate the parameter (p=X/n)

Each population parameter has a corresponding statistic and vice versa.
Sample mean - population mean
Sample percentage - population percentage
Sample median - population median
Sample standard deviation - population standard deviation
Sample 1st quartile - population 1st quartile
Sample correlation coeffcient - population correlation coefficient
Sample least squares coeffcients - population least squares coeffcients
etc ....

Each of these sample numbers is called a point estimate.

Interval Estimation

In real life a point estimate is rarely enough, usually we also need some estimate of the error in our estimate.

Example Again we did a survey of the undergraduate students at the Colegio. For that we interviewed 150 randomly selected students, and found a (sample) mean GPA of 2.53. We really want to know the (population) mean GPA of all the undergraduates at the Colegio. Now the "2.53" is the mean GPA specifically for the sample we collected, if we repeated the whole process and found a different sample, we would also get a different sample mean. Let's say the (population) mean GPA for all the undergraduates is mGPA. It is is pretty clear that mGPA≠2.53, but hopefully mGPA is close to 2.53. Only, how close?

One way to answer such questions is to find an interval estimate rather than a point estimate. Specifically we will consider a type of interval estimate called a confidence interval

We will learn about confidence intervals using the mean m as an example. Here the formal definition is
A 100(1-a)% confidence interval for the population mean m is given by

First notice that the interval is given in the form point estimate error, which is quite often true in Statistics although not always.

We already know all the ingredients of this formula with the exception of tn,a. We will find the needed values from the table of critical values

Note: In practical applications values of tn,a are somewhere between 1.5 and 3!

Example Say in our survey we found a sample mean GPA of 2.53 with a standard deviation of 0.65. Find a 90% confidence interval for the mean GPA:

and so our 90% confidence interval is (2.53-0.088, 2.53+0.088) = (2.442, 2.618)

What does that mean: a 90% confidence interval for the mean is (2.442, 2.618)? The interpretation is this: suppose that over the next year statisticians (and other people using statistics) all over the world compute 100,000 90% confidence intervals, many for the mean, others maybe for medians or standard deviations or ..., than about 90% or about 90,000 of those intervals will actually contain the parameter that is supposed to be estimated, the other 10,000 or so will not.
It is tempting to interpret the confidence inteval as follows: having found our 90% confidence interval of (2.442, 2.618), we are now 90% sure that the true mean GPA (the one for all the students at the Colegio) is somewhere between 2.442 and 2.618. Strictly speaking this interpretation is not correct because once we have computed the interval (2.442, 2.618) the true mean GPA is either in it or not. Nevertheless at least intuitively this interpretation is also useful.

Let's do a simulation to illustrating all this:

First we need some data. We are going to generate data with 50 observations from a N(10,3) as follows:
Calc > Random Data > Normal, generate 50 rows of data, store in c1, mean 10, standard deviation 3
Then I compute the 90% confidence interval for this dataset:
Stat > Basic Statistics > 1-sample-t, Samples in column c1, Options > Confidence Level: 90
and check whether it contains the true parameter mean 10.

As always with simulations we want to repeat this many times. This is done in the MACRO ci. Run it as follows:
Call three columns low high OK
CTRL-l, %K:\3015\ci 90 50 10 3 'low' 'high' 'OK'

Here 90 is the confidence level (90%), 50 is the sample size, 10 is the mean and 3 the standard deviation

But why would we be willing to accept a 10% chance of being "wrong", that is of getting an interval that does not contain the true parameter? Well, we don't have to, after all we chose to compute a 90% confidence interval. Instead we could have found a 99% confidence interval and only leave a 1% chance being "wrong". Here is what would happen:
100(1-a)% = 99% , so 1-a=0.99, a=0.01, a/2=0.005, t149,0.005 = 2.609

So this interval is larger, it has a width of 2*0.138=0.276 compared to a width of 2*0.088=0.176 for the 90% confidence interval.
In the next graph we have four confidence intervals, for different confidence levels:

So finding confidence intervals involves a trade-off: if we make the probability of being wrong smaller we (almost always) make the interval larger.
The only way to make an interval smaller without changing the confidence level is to get a larger data set!

Simultaneous Inference

One major difficulty in properly interpreting confidence intervals (and later hypothesis tests) arises when we find more than one of them. If we have several 90% confidence intervals the 90% coverage applies to each individually but not to the whole collection simultaneously.

Example When we run the simulation above each interval individually has a 10% chance of not containing m, but for the collection of 100 intervals it is virtually certain that some will not contain m.

Example If you flip a coin once the probability of heads is 0.5. If our flip a coin 10 times the probability of at least one heads is 0.999. Standard statistical inference assumes you only do one analysis (like calculating a confidence interval).

Example Say a biologist wants to estimated the number of birds that migrate to the south. So he sits in his chair and watches the sky everyday, say from 10am to 11am. Based on the number of birds he sees flying by he calculates a 90% confidence interval for the true number of birds that flew south on that day. On any one day there is about a 10% chance that his interval is wrong. But if he watches for two or three weeks the probability is 1 that at least one of all these intervals is wrong!

For more on confidence intervals see page 350 of the textbook.