For the pdf slides, click here
GLM overview
In a GLM, a smooth monotonic link function connects the expectation with the linear combination of ,
In a generalized linear mixed model (GLMM), we have
Theory of GLMs
Exponential family
Exponential family of distributions
- The density function for an exponential family distribution
- : arbitrary functions
- : an arbitrary scale parameter
- : the canonical parameter; completely depend on the model parameter
Properties about exponential family mean and variance
- In most practical cases, where is a known constant
- We define a function
Exponential family examples
Iteratively re-weighted least square (IRLS)
Fitting GLMs
For the GLM model and , assuming , the log likelihood is
- To optimize, we use the Newton’s method, which is an iterative optimization approach
- Where both and are evaluated at the current iteration
- Alternatively, we can use the Fisher scoring variant of the Newton’s method, by replacing the Hessian matrix with its expectation
Next, we will need to compute the gradient vector and expected Hessian matrix of
Compute the gradient vector and expected Hessian of
By the chain rule,
Therefore, the gradient vector of is
The expected Hessian (expectation taken wrt ) is
The Fisher scoring update
Define the matrices
The Fisher scoring update for is
Iteratively re-weighted least square (IRLS) algorithm
Initialization:
- is usually zero, but may be a small constant ensuring is finite
Compute pseudo data and iterative weights :
Find by minimizing the weighted least squares objective then update
- Repeat Step 2-3 until the change in deviance is near zero
IRLS example 1: logistic regression
For logistic regression,
Therefore, in Step 2 of IRLS,
IRLS example 2: GLM with independent normal priors
Assume that the vector has independent normal priors
Log posterior density (we still call it , with some abuse of notation)
Gradient vector and expected Hessian matrix (wrt )
- Here, and are the same as in Equation and
- IRLS for GLM with independent normal priors
Asymptotic consistency of MLE, deviance, tests, residuals
Large sample distribution of
Hessian of the negative log likelihood (also called observed information)
Fisher information, also called expected information
Asymptotic normality the MLE
Deviance
Deviance is the GLM counterpart of the residual sum of squares in normal linear regression
- Here, is the maximized likelihood of the saturated model: the model with one parameter per data point. For exponential family distribution, it is computed by simply setting .
- and are the maximum likelihood estimates of the canonical parameters for the saturated model and the model of interest, respectively
From the second equality, we can see that deviance is independent of
For normal linear regression, deviance equals the residual sum of squares
Scaled deviance
Scaled deviance does depend on
If the model is specified correctly, then approximately
- To compare two nested models,
- If is known, then under , we can use
- If is unknown, then under , we can use
Canonical link functions
The canonical link is the link function such that where is the canonical parameter of the distribution
Under canonical links, the observed information and the expected information matrices are the same
Under canonical links, since , the system of equations that the MLE satisfies becomes Thus, if , we have
- For any GLM with an intercept term and canonical link, the residuals sum to zero, i.e.,
GLM residuals
Model checking is perhaps the most important part of applied statistical modeling
It is usual to standardize GLM residuals so that if the model assumptions are correct,
- the standardized residuals should have approximately equal variance, and
- behave like residuals from an ordinary linear model
Pearson residuals
- In practice, the distribution of the Pearson residuals can be quite asymmetric around zero. So the deviance residuals (introduced next) are often preferred.
Deviance residuals
Denote as the th component in the deviance definition , so that the deviance is
By analogy with the ordinary linear model,we define the deviance residual
- The sum of squares of the deviance residuals gives the deviance itself
Quasi-likelihood (GEE)
Quasi-likelihood
Consider an observation , of a random variable with mean and known variance function
- Getting the distribution of exactly right is rather unimportant, as long as the mean-variance relationship is correct
Then the log quasi-likelihood for , given , is
- The log quasi-likelihood for the mean vector of all the response data is
To obtain the maximum quasi-likelihood estimation of , we can differentiate wrt , for this is exactly the GLM maximum likelihood solution, which can be obtained through IRLS
Generalized Linear Mixed Models (GLMM)
Generalized linear mixed models (GLMM)
A GLMM model for an exponential family random variable
Difficulty in moving from linear mixed models to GLMM: it is no longer possible to evaluate the marginal likelihood analytically
One effective solution is Taylor expansion around , the posterior mode of
Laplace approximation of GLMM marginal likelihood
For GLM, note that the expected Hessian is
- is the IRLS weight vector based on the implied by and
Therefore, the approximate marginal likelihood is
Penalized likelihood and penalized IRLS
- The point estimators and are obtained by optimizing the penalized likelihood
To simplify notation, we denote
A penalized version of the IRLS algorithm (PIRLS) : by , a single Newton update step is
Penalized quasi-likelihood method
- Since optimizing the Laplace approximate marginal likelihood can be computationally costly, it is therefore tempting to instead perform a PIRLS iteration, estimating at each step based on the working mixed model
References
- Wood, Simon N. (2017), Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC