Skip to main content icon/video/no-internet

Correlated data sets arise from repeated measures studies where multiple observations are collected from a specific sampling unit (a specific patient's status over time), or from grouped or clustered data where observations are grouped based on sharing some common characteristic (animals in a specific litter). When measurements are collected over time, the term longitudinal or panel data is preferred. Generalized estimating equations (GEEs) provide a framework for analyzing correlated data. This framework extends the generalized linear models methodology, which assumes independent data. We discuss the estimation of model parameters and associated variances via generalized estimating equation methodology.

The usual practice in model construction is the specification of the systematic and random components of variation. Classical maximum likelihood models then rely on the validity of the specified components. Model construction proceeds from the (components of variation) specification to a likelihood and, ultimately, an estimating equation. The estimating equation for maximum likelihood estimation is obtained by equating zero to the derivative of the loglikelihood with respect to the parameters of interest. Point estimates of unknown parameters are obtained by solving the estimating equation.

Generalized Linear Models

The theory and an algorithm appropriate for obtaining maximum likelihood estimates where the response follows a distribution in the exponential family was introduced in 1972 by Nelder and Wedderburn. They introduced the term generalized linear model (GLM) to refer to a class of models that could be analyzed by a single algorithm. The theoretical and practical application of GLMs has since received attention in many articles and books.

GLMs encompass a wide range of commonly used models such as linear regression, logistic regression for binary outcomes, and Poisson regression for count data outcomes. The specification of a particular GLM requires a link function that characterizes the relationship of the mean response to a vector of covariates. In addition, a GLM requires specification of a variance function that relates the variance of the outcomes as a function of the mean.

The derivation of the iteratively reweighted least squares (IRLS) algorithm appropriate for fitting GLMs begins with the likelihood specification for the exponential family. Within an iterative algorithm, an updated estimate of the coefficient vector may be obtained via weighted ordinary least squares where the weights are related to the link and variance specifications. The estimation is then iterated to convergence where convergence may be defined, for example, as the change in the estimated coefficient vector being smaller than some tolerance.

For any response that follows a member of the exponential family of distributions, f(y) = exp{[y θ − b(θ)]/φ + c(y, φ)}, where θ is the canonical parameter and φ is a proportionality constant, we can obtain maximum likelihood estimates of the p × 1 regression coefficient vector β by solving the estimating equation given by

None

In the estimation equation, Xi is the ith row of an n × p matrix of covariates X, μi = g(xiβ) represents the expected outcome E(y) = b′(θ) in terms of a transformation of the linear predictor ηi = xiβ via a monotonic (invertible) link function g(), and the variance V(μi) is a function of the expected value proportional to the variance of the outcome V(yi)=φ V(μi). The estimating equation is also known as the score equation because it equates the score vector Ψ(β) to zero.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading