Population: all of the entities (people, events, things etc.) that are the focus of a study
Sample: any subset of the population
Parameter: any numerical quantity associated with a population
Statistic: any numerical quantity associated with a sample
After our discussion of probability we can now be a little bit more precise
Example Say we roll a fair die until the first time we get a six. We always are in one of two situations:
For the first couple of values of k we find
| k: | 1 | 2 | 3 | 4 | 5 | ... |
| P(X=k): | 0.167 | 0.139 | 0.116 | 0.096 | 0.080 | ... |
• m=1/(1/6)=6.0
• s=(√(5/6))/(1/6) = 5.48
In fact there are formulas for all kinds of summary statistics. For example here we would find:
• third quartile Q3=8
• 95th Percentile P95=16
and so on. But because they are computed for the whole population they are parameters.
6 1 7 4 1 9 9 13 30 1 3 1 15 14 5 7 5 11 6 8 6 5 12 2 10
2 2 25 2 3 8 4 1 5 1 2 2 15 8 6 5 10 3 3 5 1 12 6 2 5
8 19 1 4 1 18 4 1 1 2 10 4 8 5 8 11 2 2 4 3 2 4 1 23 4
18 1 5 19 1 5 1 4 6 4 1 17 3 16 1 3 11 10 1 2 5 1 1 5 1
4 16 5 2 1 1 2 7 6 4 2 3 6 12 15 1 17 8 6 4 9 2 1 2 21
7 9 18 28 5 26 8 1 12 10 11 1 11 21 3 3 8 8 5 8 7 1 6 6 1
1 9 20 12 1 11 2 1 7 1 20 1 13 1 2 31 2 2 1 1 8 10 1 4 10
2 1 3 1 12 9 1 4 3 1 5 5 5 2 3 4 3 2 3 6 6 4 13 3 1
5 6 14 4 5 6 3 8 1 2 4 20 4 11 4 12 5 1 19 6 3 9 17 10 3
3 18 2 2 8 20 8 3 9 5 12 5 5 6 9 5 1 2 2 4 1 12 5 9 1
6 1 19 22 3 5 3 7 8 10 12 8 1 5 2 1 6 13 6 12 1 2 1 4 8
7 5 19 5 2 7 1 1 7 1 13 2 6 1 7 1 6 11 2 1 2 2 8 2 2
9 1 2 15 2 18 12 4 5 12 1 31 4 3 3 11 3 3 6 2 7 1 6 25 2
5 4 15 3 9 11 5 11 3 9 10 6 6 2 1 11 2 1 9 14 15 2 19 2 2
8 5 8 4 9 7 2 1 2 2 4 1 12 11 6 4 8 14 6 10 3 5 5 2 9
1 1 4 7 4 5 11 13 8 5 2 3 1 5 2 13 2 25 3 8 4 4 15 4 1
2 10 10 8 8 9 1 1 7 8 3 5 4 4 2 5 1 6 2 2 2 1 15 3 1
7 8 1 18 3 7 5 9 3 10 5 5 4 5 9 19 2 7 2 4 6 5 7 1 12
2 2 2 1 1 1 1 4 12 3 1 11 1 1 10 1 1 7 17 4 6 3 5 9 7
1 1 4 1 6 6 7 3 4 8 6 14 14 5 2 8 2 6 4 9 5 6 9 1 1
Now we can use this dataset to find probabilities:
| k: | 1 | 2 | 3 | 4 | 5 | ... |
| P(X=k): | 0.182 | 0.132 | 0.078 | 0.086 | 0.098 | ... |
• mean
=6.192
• standard deviation s=5.50
• third quartile Q3=8.0
• 95th Percentile P95=18.0
Because these are computed from a sample they are statistics
| k: | 1 | 2 | 3 | 4 | 5 |
| P(X=k) (theory) | 0.167 | 0.139 | 0.116 | 0.096 | 0.080 |
| P(X=k) (sample) | 0.182 | 0.132 | 0.078 | 0.086 | 0.098 |
or we can do this by looking at some summaries:
![]() |
s | Q3 | P95 | |
|---|---|---|---|---|
| Population | 6.0 | 5.48 | 8 | 16 |
| Sample | 6.192 | 5.50 | 8 | 18 |
It seems our die is pretty much a fair die.
The most important feature of the scientific method is that any scientific theory has to be falsifyable, that is it has to be possible to carry out experiments and compare the results of these experiments to predictions made by the theory. If they agree, the theory looks good, if not we need to change the theory or even find a new one. But how do we decide whether or not they "agree"? That is one place where Statistics comes into play.
• Theory: our die is fair
• predictions made using this theory: P(X=1)=0.167, m=6.0, ...
• carry out an experiment (6, 1, 7, 4, 1, ...)
• compare predictions with results of the experiment
P(X=1)=0.168 (theory), P(X=1)=0.182 (experiment)
m=6.0,
=6.192
• do they agree or is the theory bad?
Note: most "theories" we look at are not big scientific theories but simple things like "Our new drug works better than the currently available one"