Exercise - Basic Ideas

Problem 1

Consider the dataset Forbes.

Part 1

Say you wish to study the relationship between Assets and Sales. Are there any outliers? If so, what are they?

Part 2

Say you wish to study the relationship between Assets and Sector. Are there any outliers? If so, what are they?

Problem 2

Again consider the dataset Forbes. For each of the variables decide whether it comes from a normal distribution.

Solutions

Problem 1

Part 1

We need to look at three graphs:

• Boxplot of Assets. We find nine outliers. We find them as follows: Go to Data > Sort, Sort columns(s) Assets Companies, by column Assets, check box Descending, hit enter.
The nine companies with the highest Assets are IBM, Cigna, Mellon Bank, General Electric, Norwest, Bell Atlantic, Phillips Petroleum, American Electric Power and Allied Signal

• Boxplot of Sales. We find six outliers. They are IBM, General Electric, Kroger, Cigna, Phillips Petroleum and United Technologies

• Scatterplot of Assets by Sales. There seem to be 4 unusual observations. Identify them by clicking on the dots. They are IBM, Cigna, General Electric, Mellon Bank.

Part 2

Sector is a discrete variable, so it cannot have outliers. Therefore we need to look at just one graph:

• Boxplot of Assets by Sector. We find one outlier in "Energy" (Phillips Petroleum ), two in "Finance" (Cigna, Mellon Bank) and one in "Other" (Allied Signal).


Problem 2

•Assets - boxplot - many outliers - not normal
•Sales - boxplot - many outliers - not normal
•Market Value - boxplot - many outliers - not normal
•Profits - boxplot - many outliers - not normal
•Cash Flow - boxplot - many outliers - not normal
•Employees - boxplot - many outliers - not normal
•Sector - discrete- not normal