Exercise Problems - Final

The following is data on a sample of employees of a store:


here are some useful numbers:


We will use this dataset for the whole exercise

Problem 1: Find the mean and the standard deviation of Age


Problem 2:
a) Find the 5 number summary of Wage
b) Say the owner wants to estimate the payroll taxes he has to pay. Should he use the mean or the median to estimate the "average" wage?


Problem 3: Find a 90% confidence interval for the mean age of the employees (assume the data has a normal distribution).


Problem 4: Are there statistically significantly more female than male employees? Test at the 5% level.


Problem 5: Is there a statistically significant difference in the wages of female and male employees? Test at the 10% level. You can assume that the data has a normal distribution.


Problem 6: Say we randomly select 7 of the employees in this sample. What is the probability that at least 4 are female?


Problem 7: What sample size would be needed to find a 90% confidence interval for the mean age of the employees with an error of 3 years?


Problem 8: Say these 14 are a representative sample of all the employees in this company. We are going to randomly select another one (not in the sample). You can assume that the wage of this person has a normal distribution with mean $7.50 and standard deviation $1.80.
a) What is the probability that the wage of this person is between $6.50 and $7.00?
b) What is the Interquartile Range of the wages?


Solutions


Problem 1
n = 14 ∑x = 460
and so = ∑x/14 = 32.85

∑x2 = 18544
and so Sxx = ∑x2 - (∑x)2/n = 18544-(460)2/14 = 3429.66
and so S = √Sxx/(n-1) = √3429.66/13 = 16.24


Problem 2
The ordered dataset is
6.50 6.50 6.50 6.50 6.50 7.25 7.50 7.50 8.90 8.95 9.15 9.75 11.25 12.50
and so we have
Min = 6.50
Q1=P25, np/100 = 14×25/100 = 3.5, round up to 4, so Q1=6.50
Median = (7.50+7.50)/2 = 7.50
Q3=P75, np/100 = 14×75/100 = 10.5, round up to 11, so Q3=9.15
Max = 12.50
so
Min Q1 Median Q3 Max
6.50 6.50 7.50 9.15 12.50


Problem 3
100(1-a)% = 90%, so a=0.1, so a/2=0.05, so tn-1,a/2=t13,0.05=1.7709
± tn-1,a/2 s/√n = 32.85± 1.7709 × 16.24/√14 = 32.85 ± 7.69
and so a 90% confidence interval for the mean age is (25.16, 40.54)


Problem 4
Let p be the true proportion of female employees, then = 8/14 = 0.57
1) Parameter: proportion p
2) Method: based on normal approximation
3) Assumptions: n = 8 > 5 and n(1-) = 7 > 5
4) a=0.05
5) H0: p=0.5 (there are equally many male and female employees)
6) Ha: p > 0.5 (there are more female than male employees)
7) Z = (-p0)/√(p0(1-p0)/n) = 0.0714/0.1336 = 0.534
8) We reject H0 if Z > za, za = z0.05 = 1.645, Z=0.534 ≤ 1.645, so we fail to reject H0
9) there is not enough evidence to conclude that there are more female than male employees.


Problem 5
We find
  Female Male
Sample Size 8 6
7.66 8.99
s 1.39 2.40
1) Parameters: two means
2) Method: 2-sample t test
3) Assumptions: boxplots by gender indicate data is normal and equal variance is fine
4) a=0.1
5) H0: mF=mM (mean wages of male and female employees are the same)
6) H0: mFmM (mean wages of male and female employees are different)
7)

8) we reject H0 if |T| > tn1+n2-2,a/2, tn1+n2-2,a/2 = t12,0.05 = 1.7823, |T|=1.31 ≤ 1.7823, so we fail to reject H0
9) there is not enough evidence to conclude that the wages are different.


Problem 6)
Let the random variable X be the number of employees selected that are female, then X~Bin(7,8/14). So P(X≥4) = 1-P(X≤3) = 1-0.0074 = 0.9926

Problem 7
The formula for the sample size of a confidence interval for the mean is

Now
100(1-a)% = 90%, so a=0.1, so a/2=0.05, so za/2=z0.05=1.645
s = s = 16.24
E = 3, so
n = (za/2×s/E)2 = (1.645×16.24/3)2 = 79.3 ~ 80


Problem 8
a)
By computer: P(6.5 < X < 7) = P(X < 7) - P(X < 6.5) = 0.101
By hand: P(6.5 < X < 7) =
P((6.5-7.5)/1.8 < (X-m)/s < (7-7.5)/1.8) =
P(-0.56 < Z < -0.28) =
P(Z < -0.28) - P(Z < -0.56) =
P(Z > 0.28) - P(Z > 0.56) =
1-P(Z < 0.28) - (1-P(Z < 0.56)) =
1-0.6103 -(1-0.7123) = 0.7123-0.6106 = 0.1017

b) IQR = Q3-Q1 = P75-P25
by computer IQR = P75-P25 = 2.428
by hand: P75: We need to find x such
0.75 = P(X < x) =
P((X-m)/s < (x-7.5)/1.8) =
P(Z < (x-7.5)/1.8), so
(x-7.5)/1.8 = 0.67 or x = 1.8×0.67+7.5 = 8.706
P25: We need to find x such
0.25 = P(X < x) =
P((X-m)/s < (x-7.5)/1.8) =
P(Z < (x-7.5)/1.8).
so (x-7.5)/1.8 =- 0.67 or x = 1.8×(-0.67)+7.5 = 6.294
Finally
IQR = Q3-Q1 = P75-P25 = 8.706-6.294 = 2.412