Skip to main content icon/video/no-internet

It is often the case that the analysis of the values of a set of random variables becomes simpler if one posits that these variables are related to another set of variables, called latent variables, whose values are unobserved. Consider, for example, the two-dimensional data in Figure 1. This data set is complicated in the sense that it is multimodal and, thus, cannot be summarized by a standard distribution such as those comprising the exponential family. A way of simplifying the analysis is by assuming that the distribution of these data is a combination of two simple distributions, namely, two Gaussian distributions. Each data item was generated as follows. First, a value for a latent variable was sampled from a Bernoulli distribution. Next, if the latent variable was set to 0, then the data item was sampled from the first Gaussian distribution (e.g., with mean vector [3 3]T); if the latent variable was set to 1, then the data item was sampled from the second Gaussian distribution (with mean vector [7 7]T). Although the value of the latent variable is not observed, it is easy to use Bayes's rule to compute the distribution of the variable given the data item.

This model is a latent variable model known as a mixture model. Mixture models provide a principled way of combining two or more simple distributions (e.g., unimodal distributions such as Gaussian distributions) into a single complicated (e.g., multimodal) distribution. As this example illustrates, mixture models are “piecewise estimators” in the sense that different components are used to summarize different subsets of the data. The subsets do not, however, have hard boundaries; as discussed below, a data item might be a member of multiple subsets simultaneously.

Mixtures-of-experts (ME) models are an extension of mixture models. They differ from conventional mixture models in that their mixture components are conditional probability distributions. Consequently, they are suitable for summarizing data sets in which the distribution of output or response variables depends on the values of input or covariate variables. Such data sets arise in the context of regression or classification tasks, for example.

None

Figure 1 Two-Dimensional Data to Be Summarized

ME models perform tasks using a “divide and conquer” strategy—complex tasks are decomposed into simpler subtasks. ME models can be characterized as fitting piecewise models to the data. The data are assumed to form a countable set of paired variables X = {(x(t),y(t))}Tt=1, where x is a vector of explanatory variables, also referred to as covariates, and y is a vector of responses. ME models divide the covariate space, meaning the space of all possible values of the explanatory variables, into regions, and then they fit simple surfaces to the data that fall in each region. Unlike many other piecewise approximators, these models use regions that are not disjoint. The regions have “soft” boundaries, meaning that data points may lie simultaneously in multiple regions. In addition, the boundaries between regions are themselves simple parameterized surfaces whose parameter values are estimated from the data.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading