Fixing Problems - Normal Assumption and Equal Variance

Quite often problems with the normal assumption can be seen in the boxplots of the variables as severe outliers:
Consider the variable bodyweight of the Brain and Body Weight of 62 Mammals dataset:

We can try and fix this problem by using a transformation. There are three commonly used transformations: square root, logarithm and inverse. Applying these transformations to the body weight data and redrawing the boxplots yields:

so we see that the log-transform does a very good job.

of course we are doing regression, so we have (at least) two variables. We might see any of these combinations:

• upper left: x - good, y - good
• upper right: x - good, y - bad
• upper left: x - bad, y - good
• upper left: x - bad, y - bad

Depending on the situation we might try transformations on either x or y or both

Example Let's go back to the Mammals datatset. Here are the marginal plots and the normal plot of residuals for the origianl and the log-transformed data:

Clearly the transformed data is much better.

Note transforming the data also has its down-side: it makes understanding the model much harder:

Model in original units: brain wt g = 89.9 + 0.967 body wt kg

Model in transformed units: log(brain weight) = 0.921 + 0.746 log(body weight)

the original model tells us that each extra kg of body weight roughly adds one gram of brain weight
but what is the slope of 0.746 in the transformed model telling us?

Equal Variance Variance

Sometimes a transformation of the response variable can help with this problem as well. Mostly, though, a more complicated method for analysing such a dataset is needed (such as weighted regression)