Skip to main content icon/video/no-internet

The logit model was developed as a variant of the LOG-LINEAR MODEL to analyze Categorical Variables, parallel to the analysis of CONTINUOUS VARIABLES with regression analysis. The basic formulation of the log-linear and logit models begins with the assumption that we are dealing with a CONTINGENCY TABLE of two or more dimensions (represented by two or more variables), each of which has two or more categories, which may be ordered but are usually assumed to be unordered. The analysis of a two-way contingency table for CATEGORICAL variables is a relatively simple problem covered in basic statistics texts. When tables contain three or more categorical variables, however, it quickly becomes cumbersome to consider all the different ways of arranging the multidimensional tables in the two-dimensional space of a page.

Log-linear models are used to summarize three-way and higher-dimensional contingency tables in terms of the effects of each variable in the table on the frequency of observations in each cell of the table; they are also used to show the relationships among the variables in the table. As described by Knoke and Burke (1980, p. 11), the general log-linear model makes no distinction between independent and dependent variables but in effect treats all variables as dependent variables and models their mutual associations. The expected cell frequencies based on the model (e.g., Fijk for a three-variable table) are modeled as a function of all the variables in the model. The logit model is mathematically equivalent to the log-linear model, but one variable is considered a dependent variable, and instead of modeling the expected cell frequencies, the odds of being in one category (instead of any of the other categories) on the dependent variable is modeled as a function of the other (independent) variables instead of the expected frequency being modeled.

THE GENERAL LOG-LINEAR MODEL

Consider as an example a three-way contingency table in which variable A is sex (0 = male, 1 = female), variable B is race (1 = White, 2 = Black, 3 = other), and variable C is computer use, which indicates whether (0 = no, 1 = yes) the respondent uses a personal computer. The categories of each variable are indexed by i = 0, 1 for A,j = 1, 2, 3 for B, and k = 0, 1 for C. For a trivariate contingency table, the saturated model is the model that includes the effect of each variable, plus the effects of all possible interactions among the variables in the model. Using the notation for log-linear models,

None

where each term on the right-hand side of the equation is multiplied by the other terms, and

Fijk = frequency of cases in cell (i,j,k), which is expected if the model is correct.

η = geometric mean of the number of cases in each cell in the table, η = (f010f020f030f110f120f130f011f021f031f111f121f131)1/12.

τAi= effect of A on cell frequencies (one effect for each category of A; one category is redundant); similarly, τBj and τCk are the respective effects of B and C.

...

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading