Example: Environmental safety and health data
First we need to change Sex to a Numeric column so it can be used as a predictor. We do this as follows:
Data > Code > Text to Numeric, Code Data from Columns: Sex, Into Columns: SexCode, Female=0, Male=1
Now we can run the regression:
Stat > Regression > Regression, Response= es&h, Predictors= Yrs Serv SexCode
ES&H = 7.04 + 0.0969 Yrs Serv - 2.59 SexCode
The residual vs. fits and normal plot look good, so this is a good model.
Or is it?
Let's do the following: run the regression again, but this time store the residuals and the fits in the worksheet:
Stat > Regression > Regression, Response= es&h, Predictors= Yrs Serv SexCode, Storage > Residuals, Fits
The residual vs. fits plot is of course just the scatterplot of the residuals vs. the fits. But how about this graph
Graph > Scatterplot, with groups and regression > Y variables=RESI1 ,X variables=FITS1, Categorical variable: Sex

The same priniciple applies: any pattern in the residual vs. fits plot is a problem!
So what's going on here?
What does our equation predict for a Female? Now SexCode=0, and so:
Female: ES&H = 7.04 + 0.0969 Yrs Serv - 2.59·0, so
ES&H = 7.04 + 0.0969 Yrs Serv
Same for a Male, and now SexCode=1:
Male: ES&H = 7.04 + 0.0969 Yrs Serv - 2.59·1, so
ES&H = 4.45 + 0.0969 Yrs Serv
Both lines have the same slope (0.0969), and so they are parallel
Important: If we use a discrete variable as just another predictor in a regression model we always fit parallel lines!
What does the fitted line plot look like? We need to work a little bit to get this graph:
Make a new column x with 0 30 (range of values in Yrs Serv)
Stat > Regression > Regression, Response= es&h, Predictors= Yrs Serv SexCode, Options > Prediction intervals x 0, check store Fits.
Stat > Regression > Regression, Response= es&h, Predictors= Yrs Serv SexCode, Options > Prediction intervals x 1, check store Fits.
Graph > Scatterplot, With groups > Y variables=es&h ,X variables= Yrs Serv, Categorical variable : Sex
Right-click graph, Add > Calculated Line, y=PFIT1, x=x
Right-click graph, Add > Calculated Line, y=PFIT2, x=x
change colors of lines to fit dots.

This is also called an additive model because going from one value of the discrete predictor (say 0) to another (say 1) each fits gets added the same amount (here 7.04-4.45 = 2.59) no matter what the value of the continuous predictor.
Sometimes parallel lines are a good model, but not always. Here is an example were using parallel lines is clearly wrong:
but that is automatically what you get if you simply fit the response versus the continuous and the discrete predictor!
Again, what does this look like for Females and Males?
Female: ES&H = 5.60 + 0.167 Yrs Serv - 0.136·0·Yrs Serv = 5.60 + 0.167 Yrs Serv
Male: ES&H = 5.60 + 0.167 Yrs Serv - 0.136·1·Yrs Serv = 5.60 + 0.031 Yrs Serv
Always fits lines with the same intercept
Commands:
Stat > Regression > Regression, Response= es&h, Predictors= Yrs Serv SexCode, Options > Prediction intervals x 0, check store Fits.
Stat > Regression > Regression, Response= es&h, Predictors= Yrs Serv SexCode, Options > Prediction intervals x x, check store Fits.
Graph > Scatterplot, With groups > Y variables=es&h ,X variables= Yrs Serv, Categorical variable : Sex
Right-click graph, Add > Calculated Line, y=PFIT1, x=x
Right-click graph, Add > Calculated Line, y=PFIT2, x=x
change colors of lines to fit dots.
What happens now?
Female: ES&H = 7.32 + 0.0722 Yrs Serv - 3.20·0 + 0.0653·0·Yrs Serv = 7.32 + 0.0722 Yrs Serv
Male: ES&H = 7.32 + 0.0722 Yrs Serv - 3.20·1 + 0.0653·1·Yrs Serv = 4.12 + 0.1375 Yrs Serv
So this fits two separate lines

This graph is easy to get: Graph > Scatterplot > with Groups and Regression
Note: you can get the same two equations by splitting up the dataset into two parts, the score and years of the Females and the score and years of the Males, and then doing a simple regression for both. Doing one multiple regression has some advantages, though. For example you get one R2 for the whole problem, not two for each part.
1) ES&H by Yrs Serv, SexCode and RaceCode = all parallel lines

2) ES&H by Yrs Serv, SexCode*Yrs Serv and RaceCode = two pairs of lines with equal intercept

3) ES&H by Yrs Serv, SexCode and RaceCode*Yrs Serv = two pairs of lines with equal intercept

4) ES&H by Yrs Serv, SexCode*Yrs Serv and RaceCode*Yrs Serv = four lines with equal intercept

5) ES&H by Yrs Serv, SexCode, SexCode*Yrs Serv and RaceCode = two pairs of parallel lines

6) ES&H by Yrs Serv, SexCode, RaceCode and RaceCode*Yrs Serv = two pairs of parallel lines

7) ES&H by Yrs Serv, SexCode, SexCode*Yrs Serv, RaceCode and RaceCode*Yrs Serv = four different lines

If we leave RaceCode out and find the best model based on Yrs Serv, SexCode and SexCode*Yrs Serv we find that it is the model with all predictors (Mallow's Cp=4.0)
There is something strange here: with RaceCode in it SexCode*Yrs serv is significant, without RaceCode it is not. How is this possible?
One possible explanation: there is a high correlation between SexCode and RaceCode. Only this is not true here, the correlation is actually 0! (Why?)
What else might explain this? Note that in the best Subset Regression with Yrs Serv, SexCode and SexCode*Yrs Serv the model with just Yrs Serv and SexCode is second best (Mallow's Cp=4.2). Is this model actually statistically significantly worse than the model with Mallow's Cp=4.0? Cp is a Statistic, that is is depends on random fluctuations. Whether a difference of 0.2 is statistically significant is hard to tell, but my guess is it is not.
A little more than you need to know for this class: A good model for this dataset needs to take into account the fact that ES&H is bounded between 1 and 10. This can be done using a transformation on the response ES&H. A possible solution is a curve that looks like this:
.
Fitting curves such as these to the data we find the following fitted line plot: