
• R2=0: no relationship between X and Y
• R2=100: perfect relationship between X and Y
• R2 of model A is greater than R2 of model B, then model A is better than model B
R2 is part of the output of the MINITAB regression command.
Example Alcohol and Tobacco data (without NI): r=0.784, r2·100% = 0.7842·100% = 0.615·100% = 61.5% = R2
Example: Elusage:
Linear model: R2=78.0%
Quadratic model: R2=84.7%
Cubic model: R2=84.7%
...
model with power 10: whatever R2is, it will be at least 84.7.0%, and probably even higher.
The reason for this is simple: Say we find the best quadratic model, which is
Usage = 196.7 - 4.640·Temperature + 0.03073·Temperature2
Now we add the cubic term Temperature3 as a predictor. One cubic model is
Usage = 196.7 - 4.640·Temperature + 0.03073·Temperature2 + 0·Temperature3
this is of course the same as the model above, so it has R2=84.7%. Only the least squares cubic model is the best
cubic model, so it's R2 cannot be smaller (and usually will be even a bit higher, even if the cubic term is not necessary).
Question: which of these polynomial models should you use?
For the Alcohol vs Tobacco data (without NI) and a 9th degree polynomial we get:

It is always possible to find a polynomial model which fits the data set perfectly, that is it has R2=100%!
But: we want our models to fit the relationship, not the random fluctuations in the dataset.
A model should be parsimoneous, that is as simple as possible. (Occam's razor)
Solution: Use the polynomial model of lowest degree where the p-value of the hypothesis test for the highest order term is less than 0.05 (that is where the highest order term is statistically significant)
Example Elusage
Quadratic model:
| Predictor | p-value |
| Temperature | 0.000 |
| Temperature**2 | 0.000 |
Cubic model:
| Predictor | p-value |
| Temperature | 0.097 |
| Temperature**2 | 0.442 |
| Temperature**3 | 0.752 |
so the polynomial model of lowest degree where the p-value of the hypothesis test for the highest order term is less than 0.05 is the quadratic, which is then our best choice.
Model is "good" = no pattern in the Residual vs. Fits plot
Step 1: If a linear model is good, use it, you are done.
If the linear model is not good, proceed as follows:
Step 2: check the square root model, the exponential model and the power model and see which of these (if any) are good.
Step 3: find the best polynomial model.
Step 4: c) Choose as the best of the good models in a) and b) the one which has the highest R2
Example Elusage.