Skip to main content icon/video/no-internet

Logistic regression is a flexible tool when one variable is identified as the response (dependent) variable and it is categorical. Contingency table analysis and loglinear models can handle only categorical variables, which becomes a constraint when both categorical and continuous variables should be included in a model. Also, most of the time, researchers are interested in a particular variable as the response. More importantly, the total number of variables that can be handled by a contingency table will easily reach a limit—a table with five variables and each variable having only two values will have 25 = 32 cells. Unless for a very large sample, many cells will have very few cases.

Logistic regression is better equipped than ordinary regression for modeling on a categorical response variable. The simplest categorical variable has only two different values (“dichotomous” or “binary”), such as pass or fail. A variable with multiple values without any order is multinominal, and a variable with multiple ordered values is multiordinal. Different logistic regression models should be used according to the measurement nature of the response variable. Here, the focus is on logistic regression with a binary response variable. There can be as many explanatory (independent) variables as needed, and they can be either continuous or categorical.

Several problems will arise if we still use ordinary regression analysis when the response variable is categorical. Consider the following simple linear regression model:

None

Note that Y can take only two possible values, say, 1 and 0, which will put a constraint on its predicted values. That is, this model will generate many predicted values that are neither 1 nor 0. And this will cause another problem when there is more than one explanatory variable. Any one of the explanatory variables may explain the majority of the very limited variation of the response variable, leaving little room for others. In addition, the distribution of the error terms cannot be normal because the terms have only two possible values: when Y = 1, ∊= 1 − α − βX, and when Y = 0, ∊ = −α − βX. Another assumption to be violated is constant error variance—the variance of X will change at different levels of X (formal expressions omitted).

These problems arise because the relationship between X and Y is not linear anymore. A transformation is needed on the categorical response variable so that it can be predicted by a linear relationship with the explanatory variables. Several transformations have been suggested, but statisticians have found the logit transformation very useful.

Odds and the Logit Transformation

In essence, logistic regression does not directly model on the value of the response variable, but on the probability that a particular value occurs. Let π be the probability that a value occurs; then 1 − π is the probability that it does not occur. The odds is the ratio of these two probabilities:

None

After taking the natural logarithm of the odds, a linear relationship between the transformed variable and the explanatory variables can be established, which is called the logistic transformation, or logit, for short:

None

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading