Skip to main content icon/video/no-internet

Logistic regression provides an equation for circumstances in which the dependent variable is categorical, usually dichotomous (although there is a form of logistic regression for ordinal variables). For example, suppose the dependent variable is the gender of the writer of a specific passage. The use of predictor variables in the regression equation represents some elements of the text and the goal is to take those elements and predict whether or not the writer of the text is male or female. The issue becomes the degree to which one can read a text and then correctly identify the gender of the writer. The way to think of the answer is in terms of the percentage chance that the particular writer’s gender is capable of identification based on a combination of predictive factors. Normally, the representation of the equation outcome reflects an improvement in the prediction of the value of the outcome based on the use of predictors compared with random chance.

Much like multiple regression, the process of logistic regression provides the generation of an equation in order to provide an improved means of prediction that extends beyond an individual variable. The key involves the assumption that some combination of predictors provides an improvement in determining the dependent variable. Unlike general multiple regression, usually involving a continuous dependent variable, the term logistic regression simply indicates a particular type of dependent variable. The kind of dependent variable represented is dichotomous or categorical and the outcome expressed is the ability to correctly classify the outcome. This entry examines logistic analysis in relation to odds ratio and logistic regression and additional logistic regression form.

Odds Ratio and Logistic Regression

One consideration in logistic analysis involves the issue of the nature of prediction and how to examine the improvement in estimation. For example, in a random world where the outcomes are equally likely, the odds of male or female as a writer of any text or passage would be 50%. One way to measure the effect of any improvement in prediction is to consider whether the accuracy rate is greater than 50% when using some variable or set of variables to improve predictability. So, the baseline probability of correct classification due to random change starts at 50% (assuming that 50% of the writing is generated by each gender). Predictors only become useful when the percentage of accuracy increases beyond 50%. The value of the equation that incorporates predictors must be evaluated against that random benchmark to determine whether the application of the equation provides a measured improvement.

When the outcome is not equally likely, then the outcome is measured against a random probability involving the baseline outcome. For example, suppose in the sample of writing, the number of female writers was 75% and the number of male writers accounted for 25% of the authors. The baseline for this outcome then becomes not 50/50 but instead 75/25 in terms of gender. What the logistic regression or odds ratio measures is the departure from accuracy using those values. The reason that this becomes important is that if one guessed in the 75/25 baseline that the writer was female in all examples, the accuracy rate would be 75%. The 75% represents the base accuracy for predicting female outcomes and the accuracy is only 25% if predicting a male author. For this example, we assume that the text is equally probable (in terms of a random chance prediction) at 50%.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading