Problem 1
We will use the Hubble's Constant dataset in this problem
Part 1
Find the slope of the least squares regression line
Part 2
Check the assumptions of least squares regression.
Part 3
Find the 68% interval estimate for the volocity of a galaxy which is 0.25 parsec from earth. Is this a "prediction" or an "extrapolation" problem?
Problem 2
For this exercise we will use the Olympics data set, specifically the Discus throw.
Part 1
Is there a relationship between the year and the discus throw? If so, how strong is it?
Part 2
Are there any outliers in the data set? If there are any, what would you do with them?
Part 3
Is a linear model o.k. for the relationship of discus and year?
Part 4
Using the linear model, find a 99% interval estimate for the gold medal winning discus throw in the next Olympics.
Part 5
For each of the following models, find the corresponding equation and decide whether the model gives a good fit to the data: quadratic, cubic, square root, exponential and power.
Part 6
Decide which of the models is best.
Solutions
Problem 1
Part 1
This is done by running the command Stat-Regression-Regression and reading the first line of the output:
The regression equation is
Velocity = - 40.8 + 454 Distance
and so the slope is 454
Part 2
there are three assumptions
a) Do we have a good model? - Check residual vs. fits plot
b) Do the residuals have a normal distribution? - Check the normal probaility plot
c) Do the residuals have equal variance?- Check residual vs. fits plot
Part 3
We want the interval for one galaxy, so we need the prediction interval
Go to Stat-Regression-Regression, go to Options, type the desired x value in the box, change 95 to desired value.
Answer: the 68% PI for Velocity is (-174.3, 319.9)
Difference between Prediction and Extrapolation.
If the x value is within the observed x values, it is prediction, otherwise it is extrapolation.
Above we found the interval for Distance=0.25, the range of Distance in the dataset is 0.0320-2.000, so this is a prediction problem. If we had found the PI for Distance=2.7, it would have been an extrapolation problem.
Note The word "prediction" is used twice with different meanings here:
a) Prediction Interval: an interval estimate for an individual observation
b) Prediction (as compared to extrapolation)
say we want the 68% interval for a galaxy which is 2.7 parsec from earth, then the answer is the 68% prediction interval (907.5, 1463.4). It is called a prediction interval but it is also an extrapolation
Problem 2
Part 1
From the scatter plot of discus vs. year it is clear that we have a strong positive relationship. The Pearson correlation of Year and Discus is r= 0.979 with a P-Value of 0.000.
Part 2
A possible outlier is the observation in the lower left corner of the scatter plot. This outlier comes from the first Olympic games in 1896. It appears to be only a slight outlier, therefore I would leave it alone.
Part 3
The residuals vs. fits plot shows some pattern, similar to an upside down parabola, therefore a linear model is not o.k.
Part 4
Because only one person will win this gold medal we will find the 99% prediction interval. This is given by (2640, 3240)
Part 5
| Model | Equation | Fit |
|
|
| Square Root | y=-246+0.15√x | not o.k. |
| Eponential | y = 0.0046×100.0029x | not o.k. |
| Power | y = 10-39.6x13.0 | not o.k.
|
| Quadratic | y=-253000+248x-0.06x2 | o.k. |
| Cubic | y=892000-1510x+0.844x2-0.00015x3 | o.k. |
Part 6
Step 1: Linear model is bad, so we go on
Step 2: Transformations
None of them are good
Step 3: Polynomial Model
| Model |
p-value of highest order term |
| Quadratic |
0.005 |
| Cubic |
0.388 |
so the quadratic is the polynomial model
Step 4: The best model is the quadratic.