| Sqfeet | |
| Graph | Scatterplot |
| Is there a relationship? | Yes, r=0.891 (p=0.00) |
| Do the residuals have a normal distribution? | Yes |
| Do we have equal variance? | Yes |
| Is a linear relationship likely? | Yes |
| Floors | |
| Graph | Multiple Boxplot |
| Is there a relationship? | If so, weak, r=0.321 (p=0.090) |
| Do the residuals have a normal distribution? | Yes |
| Do we have equal variance? | Yes |
| Is a linear relationship likely? | Yes |
| Bedrooms | |
| Graph | Multiple Boxplot |
| Is there a relationship? | Yes, r=0.674 (p=0.00) |
| Do the residuals have a normal distribution? | Yes |
| Do we have equal variance? | Yes |
| Is a linear relationship likely? | Yes |
| Baths | |
| Graph | Multiple Boxplot |
| Is there a relationship? | Yes, r=0.741 (p=0.00) |
| Do the residuals have a normal distribution? | Yes |
| Do we have equal variance? | Yes |
| Is a linear relationship likely? | Yes |
There are several possible outliers, especially observations #2 and #29. What to do with them is somewhat difficult to tell. Here is what the graphs look like without #2:

This does look ok, so we will use this dataset.
---------------------------------------------------------------------------------------------
This is the end of the preliminary analysis. Note that there is so far no mention of regression, residual vs. fits plot or normal plot. Making decisions about possible transformations and/or polynomial models early solely based on scatterplots and/or boxplots is the whole point of doing a preliminary analysis
---------------------------------------------------------------------------------------------
The regression equation is:
Price = 12.0 + 8.16 Sqfeet - 26.5 Floors - 9.29 Bedrooms + 37.4 Baths
Stat > Regression > Regression, Response= Price, Predictors= Sqfeet Floors Bedrooms Baths, Graphs > Normal Plot and Residuals vs. Fits Plot
here are the diagnostic plots:
This appears to be a good model and the assumptions of normally distributed residuals with equal variance appears to be o.k.
The highest correlation between predictors is r=0.743 (Floors-Baths)
R2 = 88.6%
The constant is not statistically significant (p value 0.504), so we might consider fitting a no-intercept model.
Of the predictor variables Bedrooms is not statistically significant (p value > 0.05), all other variables are.
Can we eliminate any predictors from the model? Using best subset regression we find that the best model uses Sqfeet, Floors and Bath (Mallow's Cp=4.8).
Stat > Regression > Best Subsets, Response= Price, Free Predictors= Sqfeet Floors Bedrooms Baths
The regression equation of this model is:
Price = - 1.8 + 7.40 Sqfeet - 19.7 Floors + 30.6 Baths
R2 = 87.7%
Note that the model with all four predictors has Cp=5.0. But Cp is a statistic, its exact value depends on the sample. So is the model with Sqfeet, Floors and Baths statistically significantly better than the model with all four predictors? We would need a hypothesis test to answer this question but MINITAb does not provide one.
For more on Mallow's Cp see page 603 of the textbook.