Skip to main content icon/video/no-internet

One central goal of statistical inference is to estimate parameters of a model. In a typical statistical analysis, a likelihood function, L(data;θ), is used to describe the relationship between the data and some unknown parameters θ that control the behavior of the data. A good estimate of θ is a value that maximizes the likelihood function, for example, the “most likely” value supported by the data. This estimate is called the maximum likelihood estimate (MLE). In some cases, we can find the MLEs analytically. But more often, we need to use numerical methods. The EM algorithm is one of the numerical techniques for this purpose.

The EM algorithm is an iterative algorithm for finding the MLEs of the parameters when the data are incomplete. The term incomplete refers to two situations: The first occurs when some of the data are missing, due to difficulties in data collection process for example. The second situation occurs when the direct optimization of the likelihood function is difficult but when latent parameters are added, the problem becomes more tractable.

Although the two situations might sound different in description, from a statistical point of view, they are similar and share the same features. First, a complete data set is divided into an “observed” part and an “unobserved” part. The unobserved part can be either missing or latent. Second, direct optimization of the likelihood function based on the observed data might be difficult, but it becomes manageable with the likelihood function based on the complete data. The EM algorithm provides a bridge to find the MLEs of the observed data using the complete data likelihood.

EM stands for expectation and maximization. They are the two essential steps in the algorithm: the expectation step (E-step) and the maximization step (M-step). The intuition of the EM algorithm is simple: We first guess the unobserved values using their expected values (E-step) and then pretend the guessed values were observed a priori and proceed to estimate the parameters based on the complete data (M-step). Because the M-step provides us with new estimates of the parameters, we again guess the unobserved values based on these new estimates. This iterative process is repeated until convergence; for example, two consecutive estimates of the parameters yield very close values.

This idea had been widely adapted in various disciplines for a long period of time, although the most frequent citation of the EM algorithm was made by Dempster, Laird, and Rubin in 1977. They provided a rigorous framework to implement this idea: That is, the correct procedure is not to impute the individual unobserved observations, but instead the complete data sufficient statistics, or more generally the (log) likelihood function itself, by the conditional expectation given the observed data. There had also been many similar formulations of the same idea prior to this paper.

The formulation of the EM algorithm is as follows. Let Yobs and Yunobs be the observed and unobserved data, respectively. The observed data likelihood is L(Yobs; θ), and the complete data likelihood is L(Yobs, Yunobs; θ). The EM algorithm obtains θ∘ for L(Yobs; θ) via the following

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading