Generalized linear models (GLMs) extend linear models to include both non-normal response distributions and transformations to linearity. The most important reference is McCullagh and Nelder (1989)
What we would like to do is develop a model for predicting "failure" from "temperature". But here the response variable "failure" is a discrete variable, actually it has a Bernoulli distribution with "success" (!) probability p. So rather than trying to predict "failure" we will try to predict p. This is done via logistic regression:
We have reponses Y1, .., Yn iid Bernoulli(p). So E[Yi] = p. We assume that p is related to the predictor x via the equation


Notice that as in simple linear regression if b=0 we have the case of no relationship between x and y. It can also be shown that b is the change in the log-odds of success corresponding to a one-unit increase in x.
How do fit such a model, that is find a and b? for linear regression we used the method of least squares, which was possible because we could directly compare x and y. This is not the case here because p and x have different forms, which is why we needed a link function. Instead we will use maximum likelihood for the estimation. The log-likelihood is given by

and we can then find the mle's by differentiation. In R this is done using the command glm with family=binomial, which we run in shuttle.fun(3). In shuttle.fun(4) we draw the fitted line plot where we see that at the expected launch temperature of 32°F the failure probability is 1.
It can be shown that q=(g')-1(m)
If j is known the distribution of y is a one-parameter exponential family.
There are many standard problems that fit into this general framework:
Gaussian For the normal distribution j=s2 and we have
Poisson Now

Binomial for the binomial distribution with a fixed (known) number of trials n and success parameter p we take the response to be y=s/n where s is the number of successes. Then

The functions for GLMs included in R are gaussian, binomial, poisson, inverse.gaussian and gamma.
Each response distribution allows a variety of link functions to connect the mean with the linear predictor Those automatically available are:
| Link | binomial | gamma | gaussian | inv.gaussian | poisson |
| logit | D | • | |||
| probit | • | ||||
| cloglog | • | ||||
| identity | • | D | • | ||
| inverse | D | ||||
| log | • | D | |||
| 1/m2 | D | ||||
| √ | • |
Say we have n observations from a GLM. Then the log-likelihood is given by
In solder.fun(4) we now fit the GLM on the five predictors, and check the ANOVA table. Clearly all five factors are highly significant.
Do we need to worry about interactions? In solder.fun(5) we refit the model and include all the second order terms. Indeed almost all of them are significant. In signal.fun(6) we check the assumptions.
It often helps to understand what a method does to do a simple little simulation. In glm.sim(1) we generate data as follows: we let x be 100 equal spaced values from 0 to 1 and generate 100 values l for the rates of a Poisson random variable using log(l) = -1 + 4×x + N(0,0.15). Then we generate 100 observations y~Pois(l). One result of this is shown in glm.sim(1). We also print out the coefficients of the glm fit, which match the true values quite well. Because Ey=l for the Poisson distribution we can even include the fitted line in the plot.
How do we find confidence intervals? Of course there is a predict method, and it yields the standard error using se=T. Say we want to find a 95% CI for l if x=0.5. The true answer of course is l=exp(-1+4*0.5)=e1=2.71828. We find the limits in glm.sim(2)
Are this good confidence intervals? In glm.sim(3) we do a small coverage study and find the limits to work quite well.
In birth.fun(2) we fit the glm with family="binomial", and also use the step function for stepwise selection. Then we see how many of the cases in the dataset are correctly predicted.