Skip to main content icon/video/no-internet

Latent class analysis (LCA) is a method or technique for identifying unlabeled groups of individuals or cases in a data set based on multivariate categorical data. LCA is used in psychology, sociology, and many other areas of application to cluster, or partition, individuals into underlying groups. The data recorded on each member of the sample are a series of measurements on categorical or qualitative variables that record discrete characteristics of the units. In many social science applications, the data consist of responses on a discrete scale to a series of questions. The latent class model (LCM) is a probability model that describes the distribution of responses in the separate groups to the several questions. Estimation of the model parameters leads to a characterization of the underlying groups in terms of likely patterns of responses to the questions.

LCA is implemented by estimating the parameters, or proportions, of the LCM. Suppose there are G latent groups in the model, and three categorical variables (A, B, and C) are measured on each case. The latent class model can be expressed as

None

On the left-hand side of this formula, pABCabc is the probability of observing the values a, b, and c on variables A, B, and C, respectively. On the right-hand side, pGg is the proportion of the units in class g, pAGag is the probability of a response a to Variable A among units in class g, pBGbg is the probability of response b to Variable B in class g, and pCGcg is the probability of response c to Variable C in class g. In the LCM, the Variables A, B, and C are assumed to be conditionally independent within the various groups indexed by g. This assumption is expressed in Equation 1 by the fact that the probability of response (a,b,c) in g is found by multiplying probabilities pAGag,pBGbg, and pCGcg.

None of the proportions in Equation 1 needs to be known a priori in order to use LCA. If nabc is the number of respondents giving response (a,b,c) to Variables (A,B,C), then the statistical likelihood for the model probabilities is a product over all patterns (a,b,c) that have been observed of pABCabc raised to the nabc power. The maximum likelihood estimates of the probabilities pGg,pAGag,pBGbg, and pCGcg can be produced using numerical methods, such as the Expectation-Maximization (EM) algorithm. The term mixture models refers to the class of statistical models that is appropriate for populations with unlabeled subpopulations. Latent class models are mixture models that are appropriate when data are categorical and, typically, include the assumption of conditional independence.

As a hypothetical example, suppose that a group of 400 college seniors is asked to rate their level of agreement on a 3-point scale (1 = disagree or strongly disagree, 2 = somewhat agree or somewhat disagree, 3 = agree or strongly agree) with the following three statements: “I worry about my grades,” “I worry about whether people like me,” “I worry about money.” The largest two counts in Table 1 correspond to disagreeing with all three statements (98) and agreeing with all three statements (109).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading