Example: Consider the data on the Sex Ratios by State. Here we want to know the true ratios for each state, a parameter. We could find it by doing a census, that is by counting all the men and and women in each state and calculating the ratio. Instead we find a sample and calculate the ratio from the sample, thereby finding a statistic. So for example for Alaska the estimate is 103.2, that means for every 100 women there are about 103 men.
• Does this mean that there are more men than women in Alaska?
To answer this we need to know the error of this estimate. For this we will find a confidence interval. So a statement we will see quite often is something like this:
• A 90% confidence interval for the sex ratio in Alaska is (99.6, 106.8)
Note that 100 is inside this interval, so it is quite possible that there are equally many men and women in Alaska.
The interpretation is this: suppose that over the next year statisticians (and other people using statistics) all over the world compute 100,000 90% confidence intervals, some for the mean, others maybe for medians or standard deviations or ..., than about 90% or about 90,000 of those intervals will actually contain the parameter that is supposed to be estimated, the other 10,000 or so will not.
Example For Washington DC the 90% confidence interval is (88.1, 88.9). 100 is not in this interval, so we can be quite sure (at least 90% sure, actually) that there are fewer men than women in Washington DC.
Let's do a simulation to illustrating all this:
First we need some data. We are going to generate data with 50 observations from a N(10,3) as follows:
Calc > Random Data > Normal, generate 50 rows of data, store in c1, mean 10, standard deviation 3
Then I compute the 90% confidence interval for this dataset:
Stat > Basic Statistics > 1-sample-t, Samples in column c1, Options > Confidence Level: 90
and check whether it contains the true parameter mean 10.
As always with simulations we want to repeat this many times. This is done in the MINITAB MACRO ci. Run it as follows:
Call three columns low high OK
CTRL-l, %K:\3102\ci 90 50 10 3 'low' 'high' 'OK'
Here 90 is the confidence level (90%), 50 is the sample size, 10 is the mean and 3 the standard deviation
By the way, according to the 2000 US Census the sex ratio in Puerto Rico is 92.7 men for 100 women, compared to 96.5 men per 100 women in the US in general. Because these numbers come from a census of the whole population they are parameters and so they don't have errors, or confidence intervals. For this and other facts about the US and Puerto Rico look at http://factfinder.census.gov.
For more on confidence intervals go to here or see page 319 of the textbook.