Distributions Arising in Statistics

In this chapter we briefly discuss some distributions that often come up in Statistics.

Chisquare Distribution

The first of these we already mentioned before as a special case of the Gamma distribution, namely the chisquare. In the context of statistics it arises as follows:
Say Z~N(0,1) and let X=Z2. then if x>0

We have the following properties of a χ2:
Say X~χ2(n), Y~χ2(m) and X and Y are independent. Then

Say X1, .., Xn are iid N(μ,σ). The sample variance is defined by

Now

Note: we use "n-1" instead of "n" because then S2 is an unbiased estimator of σ2, that is E[S2]=σ2

Note: another important feature here is that S2

Student's t Distribution (by W.S. Gosset)

Say X~N(0,1), Y~χ2(n) and X Y. Then

has a Student's t distribution with n degrees of freedom. We have ETn=0 if n>1 (and does not exist if n=1) and VTn=n/(n-2) if n>2 (and does not exist if n≤2)

Notation: Tn ~ t(n) The importance of this distribution in Statistics comes from the following: Say X1, .., Xn are iid N(μ,σ). Then

Note: S is of course an estimate of the population standard deviation, so this formula tries to standardize the sample mean without knowing the exact standard deviation. If our sample is large (?) we would expect S to be close to σ and so we would expect a t(n) distribution to be close to a N(0,1). This is in fact true. The routine simt illustrates this point.

An important special case is X~t(1). This is also called the Cauchy distribution. Notice it has no finite mean (and of course then also no finite variance). It has density

Snedecor's F distribution

Say X~χ2(n), Y~χ2(m) and X and Y are independent. Then

is said to have an F distribution with n and m degrees of freedom.
We have EF = m/(m-2) (no mention of n !)
Say X1, .., Xn are iid N(μ112) and Y1, .., Ym are iid N(μ222). Furthermore XiYj for all i and j. Then

Order Statistics

Many statistical methods, For example the median and the range, are based on an ordered data set. In this section we study some of the common distributions of order statistics.

One of the difficulties when dealing with order statistics are ties, that is the same observation appearing more than once. This should only occur for discrete data because for continuous data the probabiltity of a tie is zero. They may happen anyway because of rounding, but we will ignore them in what follows.
Say X1, .., Xn are iid with density f. Then X(i) is the ith order statistics if X(1)< ... < X(i) < ... <X(n)
Note X(1) = min {Xi} and X(n) = max {Xi}

Let's find the pdf of X(i). For this let Y be a r.v. that counts the number of Xj ≤ x for some fixed number x. We can think of Y as the number of "successes" of n independent Bernoulli trials with success probability p = P(Xi ≤ x) = F(x) for i=1,..,n. So Y~B(n,F(x)). Note also that the event {Y ≥i} means that more than i observations are less or equal to x, so the ith largest is less or equal to x. Therefore

with that we find

Example : Say X1, .., Xn are iid U[0,1]. Then for 0<x<1 we have f(x)=1 and F(x)=x. Therefore

Empirical Distibution Function

The empirical distribution function of a sample X1, .., Xn is defined as follows:

so it is the sample equivalent of the regular distribution function:

• F(x)=P(X≤x) is the probability that the rv X≤x
• Fhat(x) is the proportion of X1, .., Xn ≤x

The empirical distribution function is very important in Statistics, in R it is drawn with the routine empcdf. In normal.emp we draw the true cdf and the empirical cdf for the N(0,1).

Note

In emp.ex we do the histograms for F=U[0,1] (which=1), F=Pois(1) (which=2) and F=Exp(1) (which=3)