Distributions Arising in Statistics

In this chapter we briefly discuss some distributions that often come up in Statistics.

Chisquare Distribution

The first of these we already mentioned before as a special case of the Gamma distribution, namely the chisquare. In the context of statistics it arises as follows:
Say X~N(0,1), then X2~c2(1)

We have the following properties of a c2:
Say X~c2(n), Y~c2(m) and X and Y are independent. Then
statdistfig1.png - 2444 Bytes
Say X1, .., Xn are iid N(m,s2). Define the sample variance by
statdistfig2.png - 1773 Bytes
then (n-1)S2/s2 ~ c2(n-1)

Note: we use "n-1" instead of "n" because then S2 is an unbiased estimator of s2, that is E[S2]=s2

Note: another important feature here is that S2

Student's t Distribution (by W.S. Gosset)

Say X~N(0,1), Y~c2(n) and X Y. Then

has a Student's t distribution with n degrees of freedom. We have ETn=0 if n>1 (and does not exist if n=1) and VTn=n/(n-2) if n>2 (and does not exist if n≤2)

Notation: Tn ~ t(n) The importance of this distribution in Statistics comes from the following: Say X1, .., Xn are iid N(m,s2). Then

Note: S is of course an estimate of the population standard deviation, so this formula tries to standardize the sample mean without knowing the exact standard deviation. If our sample is large (?) we would expect S to be close to s and so we would expect a t(n) distribution to be close to a N(0,1). This is in fact true. The routine simt illustrates this point.

An important special case is X~t(1). This is also called the Cauchy distribution. Notice it has no finite mean (and of course then also no finite variance). It has density
statdistfig5.png - 2296 Bytes

Snedecor's F distribution

Say X~c2(n), Y~c2(m) and X and Y are independent. Then
statdistfig6.png - 1163 Bytes
is said to have an F distribution with n and m degrees of freedom.
We have EF = m/(m-2) (no mention of n !)
Say X1, .., Xn are iid N(m1,s12) and Y1, .., Ym are iid N(m2,s22). Furthermore XiYj for all i and j. Then
statdistfig7.png - 1943 Bytes

Order Statistics

Many statistical methods, For example the median and the range, are based on an ordered data set. In this section we study some of the common distributions of order statistics.

One of the difficulties when dealing with order statistics are ties, that is the same observation appearing more than once. This should only occur for discrete data because for continuous data the probabiltity of a tie is zero. They may happen anyway because of rounding, but we will ignore them in what follows.
Say X1, .., Xn are iid with density f. Then X(i) is the ith order statistics if X(1)< ... < X(i) < ... <X(n)
Note X(1) = min {Xi} and X(n) = max {Xi}

Let's find the pdf of X(i). For this let Y be a r.v. that counts the number of Xj ≤ x for some fixed number x. We can think of Y as the number of "successes" of n independent Bernoulli trials with success probability p = P(Xi ≤ x) = F(x) for i=1,..,n. So Y~B(n,F(x)). Note also that the event {Y ≥i} means that more than i observations are less or equal to x, so the ith largest is less or equal to x. Therefore
statdistfig8.png - 3240 Bytes
with that we find
statdistfig9.png - 24276 Bytes

Example : Say X1, .., Xn are iid U[0,1]. Then for 0<x<1 we have f(x)=1 and F(x)=x. Therefore
statdistfig10.png - 6827 Bytes