| Contents of this page: |
| Graphs for Discrete Data |
| Totals (Frequencies) vs. Percentages |
| A Table and a Graph for Continuous Data |
Example Consider the variable gender in our WRInc data. Clearly this is discrete data. Usually the first thing one would do is simply count how many of each type there are. You can use the Stat > Tables command in MINITAB to do the "counting":
Stat > Tables > Tally Individual Variables, Variable=Gender. We see that there are 9510 Female and 14281 Male employees.
Example According to the US Department of Education there were 12,263,000 undergraduate students in US colleges in 1994. Their breakdown by race was as follows:
| Race | Number |
| American Indian | 117000 |
| Asian | 674000 |
| Black | 1317000 |
| Hispanic | 968000 |
| White | 8916000 |
If a table is used for presentation purposes it should usually include a little more information and maybe a better ordering, for example by size. Also, big numbers are often expressed in bigger units:
| Race | Number (in 1000) | Percentage (%) |
| White | 8,916 | 72.7 |
| Black | 1,317 | 10.7 |
| Hispanic | 968 | 7.9 |
| Asian | 674 | 5.5 |
| American Indian | 117 | 1.0 |
| Total | 12,263 | 100 |
In order to compute the percentages we need to devide by the total and multiply by 100. The total is found using the sum command: Calc > Columns Statistics, Input Variable=Number and to calculate the percentages use Calc > Calculator, Store in c3, Expression 'Number' / SUM('Number') * 100. Usually percentages are rounded either to the nearest integer or with one number behind the decimal point, and we can do this with Calc > Calculator, Store in c3, Expression ROUND('Number' / SUM('Number') * 100,1)
Some discrete variables have a built-in (natural) ordering, for example t-shirt size (small, medium, large, x-large) or grades (A,B, ...) Such an ordering can also be used.
Graph for discrete data: Bar chart
use Graph > Bar Chart, Values from a Table, Graph Variables=Number, Categorical Variable=Race
Example This is a nice professional table from the website of the CDC (Centers for Disease Control) about the dangers of smoking:
For more on barcharts see page 73 of the textbook.
--- If the data is a random sample from a larger population percentages are often better:
Example: of 150 randomly selected people in a phone survey 85 said they would vote for candidate AA in the next election - use 57% instead.
Example: in a company with 150 employees 85 said they like their job --- use these numbers
--- For small numbers use frequencies, for large numbers use percentages
--- These are just guidelines, there can always be exceptions if there is a good reason.
Example Consider the WRInc data. If you want to make a table for the variable "Satisfaction" it is clear what it will look like:
| Rating | Frequency |
|---|---|
| 1 | 3096 |
| 2 | 2783 |
| 3 | 3854 |
| 4 | 7683 |
| 5 | 6375 |
| Rating | Frequency |
|---|---|
| 1 or 2 | 5879 |
| 3 or 4 | 11537 |
| 5 | 6375 |
But what about a continuous variable, say "Income"? Again for a table we need those "classes", but now it is not at all clear what they should be. In fact there are many possibilities. First of, now a "class" will be an interval, for example 10000-20000.
Follow these steps to construct a table (now called a frequency table) for continuous data:
1) find the range = largest observation - smallest observation
2) decide on number of classes, usually at least 5. In general, the more observations we have the higher the number of classes will be.
3) calculate the class width = range/number of classes , rounded to a nice number.
4) find left end point so that each observation falls into one and only one class.
Example: Let's do a frequency table for the Incomes of the employees of WRInc
Stat > Basic Statistics > Display Desrciptive Statistics, Income shows Min=8000 and Max=88400
Now
1) range = 88400-8000 = 80400
2) number of classes = 10 (just as an example)
3) class width = 80400/10 = 8040 , round to 10000
4) Left endpoint: 5000
Now note 5000+10000=15000, 15000+10000=25000 and so on, so
| Class |
| 5000 - 15000 |
| 15000 - 25000 |
| 25000 - 35000 |
| 35000 - 45000 |
| 45000 - 55000 |
| 55000 - 65000 |
| 65000 - 75000 |
| 75000 - 85000 |
| 85000 - 95000 |
| 95000 - 105000 |
But there is a problem: Calc > Calculator, Store in c7, Expression SUM('Income' = 15000), answer: 10, so these 10 employees would fall in both the first and the second class, but that is not allowed!
To fix this, we need to (slightly) change the classes:
5000-14999, 15000-24999, and so on
So how many employees have an income between 5000 and 14999? We can use Minitab to find out: Calc > Calculator, Store in c7, Expression SUM('Income' <= 14999) , answer: 141
For the next class we find:
Calc > Calculator, Store in c7, Expression SUM('Income' <= 24999), answer: 4523
so 4523-141 = 4382 are in the interval 15000-24999.
Continuing on like that we find
| Class | Frequency |
| 5000 - 14999 | 141 |
| 15000 - 24999 | 4382 |
| 25000 - 34999 | 9820 |
| 35000 - 44999 | 6727 |
| 45000 - 54999 | 2187 |
| 55000 - 64999 | 459 |
| 65000 - 74999 | 65 |
| 75000 - 84999 | 9 |
| 85000 - 94999 | 1 |
A histogram is just like a barchart, with the bars on top of the classes. Note, though, that in a histogram there are no spaces between the bars! In Minitab use Graph > Histogram to draw the graph. You can double-click on the graph to change the number of classes.
In general we like to draw several histograms, with different numbers of classes. Try some!
For more on histograms see page 78 of the textbook.