Skip to main content icon/video/no-internet

Maximum Likelihood Estimation

Maximum likelihood (ML) denotes an important framework for estimation and inference. It is a theory about estimating models (i.e., recovering parameters from samples) rather than specifying models (i.e., constructing models). Its logic is premised on selecting the estimates that make the data the “most likely” (relative to the other possible estimates). Thanks to its versatility, ML is treated as the gold standard for estimating advanced models, such as multilevel and structural equation models. This entry provides an overview of ML estimation.

When explicating ML estimation, it is essential to keep distinct models, algorithms, and theory. A model defines the estimand (i.e., the parameter waiting to be estimated). An algorithm (or estimator) is the computational device used to obtain an estimate. A theory is the logical blueprint for building algorithms that yield warranted estimates for models. ML estimation is thus about estimation algorithms rather than models per se.

A likelihood can be defined as the conditional probability of the data given an estimate. The likelihood lover’s principle stipulates that modelers favor estimates assigning the highest likelihood to data. ML theory can take on a plurality of forms (e.g., full, restricted, robust) but likelihood lover’s principle unites them. Suppose two jars are full of marbles. There are eight red and two green marbles in the first jar and two red and eight green in the second jar. Suppose further a jar was selected and a red marble randomly drawn. One intuitively guesses the first jar was selected, and likelihood lover’s principle clarifies why this is a safe bet. The likelihood of a red marble, given the first jar, is 0.8; if it was the second jar, it is only 0.2. This logic exemplifies ML theory.

Likelihood functions are the building blocks of ML estimation. Input the data and such functions output their joint likelihood. Likelihood functions are not probability functions, but there is a one-to-one correspondence between these two types of functions. There are binomial, gamma, and normal likelihood functions, for example. To compute likelihoods for actual data, statisticians can use likelihood functions

Education processes resemble natural lotteries; their outcomes, probability distributions. Only think about how IQ scores follow a normal curve for an example. Probability functions thus govern such distributions. Researchers working with educational data then need only select the matching likelihood function to compute likelihoods. Specifying the likelihood function is thus the first and foremost step in ML estimation.

An ML-based algorithm typically involves three more steps. All the steps can be briefly described as follows:

  • Construct a likelihood function. The model dictates the likelihood function (e.g., a normal likelihood function can be specified for data modeled as normally distributed).
  • Simplify the likelihood function and take its logarithms.
  • Take the partial derivative of the log-likelihood function with respect to each parameter and set the resulting equations equal to 0.
  • Solve the system of equations to find the parameters.

Solving systems of equations can be difficult when there are many parameters at play, even with the help of modern computers. For instance, implementing Step 4 may require the assistance of an algorithm (e.g., Newton-Raphson and EM algorithms). These algorithms use an iterative process of trial and error to converge upon a passable solution. As a caveat, there are no guarantees they will converge on the correct ML estimate or even converge.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading