Prediction

Predicting y from x

If we have an equation for x and y we can use it to make a guess ("estimate") of y for a known x.

Example Let's consider the Quality of Fish data. Use this data set to estimate the quality of a fish that was put into ice 4 hours after being caught.
Using Stat > Regression > Regression, Response=Quality, Predictor=Time we find the LSR to be Quality = 8.46 - 0.142 Time, so we get Quality = 8.46 - 0.142×4 = 7.9
We can also let MINITAB do the calculation for us:
Stat > Regression > Regression, Response=Quality, Predictor=Time, Options - Prediction interval=4, Fits= 7.8933

Confidence vs. Prediction Intervals

When doing an estimation we usually also want an idea of the "error" in our estimate. In 3101 we used confidence intervals to do this. Here we will again use confidence intervals, but now there are two types of intervals:

Confidence Interval Used to predict the mean response of many observations with the desired x value
Prediction Interval Used to predict the response of one individual observation with the desired x value

Warning The terminology is a little confused here, with the same term meaning different things: Both confidence intervals and prediction intervals as found by the regression command are confidence intervals in the sense discussed in 3101, and both are used for prediction! They differ in what they are trying to predict, on the one hand an individual response (PI), on the other hand the mean of many responses (CI).

Example Let's consider the Quality of Fish data. Use this data set to find a 90% interval estimate for the quality of a fish that was put into storage after 4 hours.
We are talking about one fish, so we want a prediction interval:
Stat > Regression > Regression, Response=Quality, Predictor=Time, Options - Prediction interval=4, Conf level=90.
and then the 90% prediction interval is (7.6556, 8.1311)

Example Again consider the Quality of Fish data. Use this data set to find a 90% interval estimate for the mean quality of fish that were put into storage after 4 hours.
Now we are interested in the mean rating of many fish, so we want a confidence interval:
Stat > Regression > Regression, Response=Quality, Predictor=Time, Options - Prediction interval=4, Conf level=90.
(Note this is exactly the same command as before)
A 90% confidence interval for the mean rating of fish after 4 hours is (7.8149, 7.9718)

The intervals are shown in the next graph, the prediction interval in red and the confidence interval in blue:

Notice that the prediction intervals are always wider than the confidence intervals. They are also the ones you want most of the time. So if you are not sure which you should use, use the prediction interval.

Prediction vs. Extrapolation

There is a fundamental difference between predicting the response for an x value within the range of observed x values (=Prediction) and for an x value outside the observed x values (=Extrapolation). The problem here is that the model used for prediction is only known to be good for the range of x values that were used to find it. Whether or not it is the same outside these values is generally impossible to tell.

Example: Quality of Fish data

Example Consider the data on smoking and lung cancer rates. Find a 99% interval estimate for the mean lung cancer rates of states were people smoke 22.3 cigarettes.
The range of number of cigarettes in the dataset is 14.0 - 42.4 (Stat > Basic Statistics > Display Descriptive Statistics, Variable= CIG), 22.3 lies between these, so this is a prediction. We want an interval for the mean rates, so we want the confidence interval. So Stat > Regression > Regression, Response= LUNG, Predictors=CIG, Options - Prediction interval=22.3, Conf level=99.
The 99% CI for the mean lung cancer rate of states with a per capita consumtion of 22.3 cigarettes is (16.890, 19.650).

For more on this see page 532 of the textbook.