Data Summaries 4 - Multivariate Data

Contents of this page:
Discrete - Discrete
Discrete - Continuous
Continuous - Continuous
Other Graphs

Discrete - Discrete

Example Consider the dataset on Breaking Cocain Addiction

So here for each subject we have two variables, "Drug" with values "Desipramine", "Lithium" and "Placebo", and "Relapsed" with values "Yes" and "No". Both variables are discrete.
Usually the first thing to do with this type of data is to just count each combination of values and write them up in a contingency table:


In MINITAB we have Stat > Tables, Cross Tabulation to count for us.

If the table is for publication you probably want to add some row and column totals:

Often instead of the totals (frequencies) these tables might be based on percentages. Here, though there are three types of percentages:

Percentages based on Grand Total:

Percentages based on Row Totals:

Percentages based on Column Totals:

Which of these 4 tables is the most interesting? It depends on the story behind the data and the result you wish to highlight. Here it is probably the third table which shows clearly that the "relapse rate" for desipramine is much smaller (41.7%) than for either Lithium (75%) or the Placebo (83.3%)

The standard graph for this data is a multiple bar chart. It is done by Graph > Bar Chart, Cluster. There are always two depending on which way the bars are grouped together, see
•Drug Relapse
•Relapse Drug

As with the tables the graphs can also show each of the three types of percentages. Here is how you draw the one based on row percentages:
• Graph > Bar Chart, Cluster, Categorical Variables: Drug Relapse, Bar Chart Options: Show Y as Percent, Within Categories of level 1

For more on this see page 167 of the textbook.

Discrete - Continuous

Consider the dataset Drug Use of Mothers and the Health of the Newborn . Here we have one continuous variable ("Length") and one discrete variable ("Status").

For this type of data we might first compute the summary statistics for each group separately:

In MINITAB we find these summaries with Stat > Basic Statistics, Display Basic Statistics

Note that the discussion on Mean vs. Median still holds: If the distributions are not bell-shaped it is probably better to use the median and IQR here.

The standard graph for this data is a multiple boxplot. Note that all the boxes are on the same scale!

To draw this graph in MINITAB use Graph > Boxplot > with Groups, Graph Variables Length, Categorical Variable Drug Use

Continuous - Continuous

Here we usually start by drawing a scatterplot. As an example consider the Olympic Men's Long Jump. To draw the scatterplot use Graph > Scatterplot.
To identify individual observations move the cursor over the dot.

For more on the scatterplot see page 178 of the textbook.

Other Graphs

Example Sex Ratio by Location. The following graph is a map of the USA, with each state showing the the number of males per 100 females: (taken from US Census 2000)
Gender Ratios
The ratio appears to go up (meaning relatively more men) as we move from east to west. Any idea why this might be? east to west. Any idea why this might be?