Skip to main content icon/video/no-internet

Log-Linear Analysis

Log-linear analysis is a multidimensional extension of the classical cross-tabulation chi-square test. While the latter can maximally consider only two variables at a time, log-linear models can determine complex interactions in multidimensional contingency tables with more than two categorical variables. Indeed, log-linear models combine characteristics of cross-tabulation chi-square tests (determining the fit between observed and expected cell counts) with those of analysis of variance (ANOVA; simultaneous testing of main effects and interactions within multifactorial designs), which is why they are sometimes informally referred to as ANOVA for categorical data. Instead of the Pearson chi-square statistic, log-linear models make use of the likelihood ratio chi-square statistic, which is calculated differently, but has approximately the same distribution when numbers of observations are large. In this review, log-linear analysis is briefly explained, with particular focus on its data requirements and modeling assumptions.

Applicability

Certain conditions have to be met for log-linear models to be reasonably applied to one’s data. First, all considered variables need to be measured at nominal-scale level (each variable has to come in two or more discrete observational categories). Second, all observations should be independent of one another, meaning that each participant—or sampling unit—should contribute one and only one observation to the data set. This is an important requirement, which also holds for classical chi-square tests. If your research design involves repeated measurements from the same subjects (such that each participant contributes more than one observation to the data), cross-tabulation tests and log-linear models should not be used. Finally, like with cross-tabulation tests in general, log-linear models call for sufficiently large numbers of observations per design cell. As a rule of thumb, all cells in the multidimensional table for analysis should have expected cell counts greater than one, and at least 80% of the design cells should have expected cell counts greater than five.

Example

The following illustration is based on simulated data. Imagine a car manufacturer is planning a new advertising campaign for its latest SUV. The manufacturer is not sure which color the car should be painted in for the advert photographs, and more importantly, whether the color of the car should be different for adverts appearing in magazines that are targeted toward different age groups and genders. The company decides to run a quick online survey in which visitors of their website can click on their favorite color option (out of three depicted suggestions: “silver,” “white,” or “black”) for the car in question. Each visitor can take part only once, and the response is counted after the visitor also indicated their age group (“over 40” or “under 40”) and gender (“male” or “female”). The imaginary survey is kept online for a few days, after which a total of 526 responses has been recorded, distributed as shown in Table 1. The manufacturer is primarily interested in testing whether color choice depends on specific combinations of age group and gender—in other words, whether age group and gender interact in predicting different color choices.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading