Exercise Problems - Correlation and Regression

Problem 1 For each of the following 6 scatterplots choose for the correlation what you think is the most appropriate from the following list: correlation is a) zero, b) weak positive, c) weak negative, d) strong positive, e) strong negative, f) does not apply


Problem 2 Consider the data set for the cost and income of movies of Castle Rock Entertainment.
a) Find the correlation coefficient r. Write down all the details, as follows:

n=
∑X=
∑Y=
∑X2=
∑Y2=
∑XY=

SXX=
SYY=
SXY=

r=SXY/√SXX×SYY=

b) Say Castel Rock has to pay 30% taxes on the Income, but they can also deduct 10 million dollars from the Income before taxes are taken. So their actual Profit is given by Profit = 10 + 0.7*(Income-10). Compute the Profit for each movie and then show that cor(Cost, Profit) = cor(cost, Income). Again show the details of your calculations.


Problem 3 Again consider the data set for the cost and income of movies of Castle Rock Entertainment.
a) Find the least squares regression line, that is find the equation y = b0 + b1x.
b): Use the least squares regression line to predict the income of a movie costing $50 million.


Problem 4

Suppose that we have X and Y data and find the least squares regression line to be Y = 5 - 0.9 X. If instead we run the regression with Y as the independent and X as the dependent variable we find X = 3 - 0.1 Y. Find the mean of the x values and the mean of the y values.



Solutions

Problem 1
Example 1: d) strong positive (actually r= 0.9)
Example 2: c) weak negative (actually r= -0.5)
Example 3: f) does not apply (outlier in upper right corner)
Example 4: a) zero
Example 5: e) strong negative (actually r= -0.99)
Example 6: f) does not apply (relationship is curved, parabola)


Problem 2
a) n=10
∑X=302
∑Y=535.2
∑X2=10742
∑Y2= 53283.25
∑XY=20257.5

SXX=1622
SYY=24607
SXY=4085

r=SXY/√SXX×SYY=0.647

b) Now Y is the Profit.
n=10
∑X=302
∑Y=404.8
∑X2=10742
∑Y2=28447.9
∑XY=15086

SXX=1622
SYY=12125
SXY=2859

r=SXY/√SXX×SYY=0.647


Problem 3 a) We have

n=10
∑X=302
∑Y=535.2
∑X2=10742
∑XY=20257.5

SXX=1622
SXY=4085

0 = SXY/SXX = 4085/1622 = 2.52
1 = -0· = 535.2/10-2.52·302/10 = -22.5

so the least squares regression is Income = -22.5+2.52·Cost

b) For a movie costing 50 million we find Income = -22.5+2.52·50 = 103.5, an Income of 103.5 million dollars


Problem 4
We have two regression lines, one with X as the independent variable and another with Y as the independent variable, therefore we also have two 0s and two 1s. Let's use the following notation

Now we have a system of 2 linear equations in 2 unknowns, which we can solve (Precalculus, anyone?) for example via substitution:
= 3-0.1, substitute in first equation:
5 = +0.9·(3-0.1) = 2.7+0.81 or
= (5-2.7)/0.81 = 2.84
and
= 3-0.1 = 3-0.1·2.84 = 2.72