Data Summaries 3

Contents of this page:
Percentiles
Some Special Percentiles: Quartiles
Boxplot (Box and Whisker diagram)

Percentiles (Measures of Location)

The pth percentile of an ordered data set is the value that has at most p% of the data below it and at most (100-p)% above it.

Steps on how to find the pth percentile:
1) Order the data set
2) Find the location of the pth percentile: np/100, round up
3) Find pth percentile at that location.

Example: Find the 67th percentile of Babe Ruth's Homerun data
Solution:
np/100 = 15*67/100 = 10.05 round up to 11
Find 11th observation: 49

So the 67th percentile of Babe Ruth's Homerun data is 49

Warning A very popular mistake is to think that the number found in step 2 is the percentile already. It is the location of the percentile in the ordered dataset. Again, if you make that mistake most of the time your answer will make no sense! In the example above we found 11, but how could that be the 67th percentile? It is smaller than any of the observations.
Also note that the way it is defined here percentiles are always actual observations.

Example Find the 80th percentile of th following dataset:

MINITAB Ver. 14 or lower does not have a command to find percentiles, but at least you can use the command Data > Sort to do the ordering.

Quartiles, Five-Number Summary and IQR

1st quartile Q1 = 25th percentile
3rd quartile Q3 = 75th percentile

Using these we can also find the Interquartile Range IQR = Q3 - Q1
and the 5-number summary:
Minimum Q1 Median Q3 Maximum

Example: Babe Ruth again:
Minimum Q1 Median Q3 Maximum
22 35 46 54 60

IQR = 54-35 = 19

The Stat > Basic Statistics > Display Descriptive Statistics command finds Q1 and Q3 as well. Note that MINITAb uses slightly different formulas from those above so the answers can be bit different.

Example Find the 5-number-summary and the IQR of the incomes of WRInc
Minimum Q1 Median Q3 Maximum
8000 26600 32400 39200 88400

IQR = 39200-26600 = 12600

What is the meaning of these percentiles?
• Q1 = P25 = $26600, so 25% (or 1 in 4) of the employees make less than $26600.
• Median = 32400, so half of the employees make less than 32400, half make more.
•Q3 = P75 = $39200, so 25% (or 1 in 4) of the employees make more than $39200.

What is the meaning of IQR? Actually it is a 3rd way to calculate a measure of "spread-out-ness", after the range and the standard deviation. To make them comparable, though, we need to devide IQR by 1.35:

Example The standard deviation of the incomes is s=9424. Now IQR/1.35 = 12600/1.35 = 9333.

Now we have several formulas (methods) for finding an "average" (mean, median) and a "spread-out-ness" (range/4, s, IQR/1.35). How do you decide which to use?

• Use the range only if you can't find either of the other two, for example if you only know the smallest and the largest observation, or if you have to do a quick calculation in your head.

• decide whether to use mean or median as we discussed before

• If you use the mean, also use the standard deviation. If you use the median, use IQR/1.35

For more on percentiles, quartiles etc. see section 3.5 of the textbook.

Boxplot (Box and Whisker diagram)

From the 5-number summary we can construct another graph for continuous data, the boxplot:
1) draw a scale, that is a number line, covering the range of the data
2) draw a box from Q1 to Q3 with a vertical line at the median
3) Find the lower fence LF = Q1 - 1.5*IQR.
LF < Min draw a line from left edge of box to Min
LF > Min draw a line from left edge of box to LF and add a star for each observation < LF

4) Find the upper fence UF = Q3 + 1.5*IQR.
UF >Max draw a line from right edge of box to Max
UF < Max draw a line from right edge of box to UF and add a star for each observation > UF

Example: Babe Ruth's homeruns:
Minimum Q1 Median Q3 Maximum
22 35 46 54 60

and so IQR = Q3-Q1 = 54-35 = 19
therefore we find
LF = Q1 - 1.5*IQR = 35-1.5·19 = 6.5 < Min = 22
UF = Q3 + 1.5*IQR = 54+1.5·19 = 74.5 > Max = 60

Example: Newcomb's Speed of Light measurements
The five-number summary is:
Minimum Q1 Median Q3 Maximum
-44 24 27 31 40

IQR = 31-24 = 7
LF = Q1 - 1.5*IQR = 13.5 > Min = -44
UF = Q3 + 1.5*IQR = 41.5 > Max = 40
Here is how to draw the boxplot, step by step:


and the final result:

Observations marked with a star are (possible) outliers, that is "unusual" observations.

In MINITAB we have Graph > Boxplot. The exact details on how a boxplot is drawn differ slightly form the one described above, so the resulting plot can look a little different.

Example: Graph > Boxplot, Simple, Graph Variable=Income

For more on the boxplot see page 145 of the textbook.