Skip to main content icon/video/no-internet

Dummy Variable

A dummy, or indicator, variable is any variable in a regression equation that takes on a finite number of values so that different categories of a nominal variable can be identified.

The term dummy simply relates to the fact that the values taken on by such variables (usually values such as 0 and 1) indicate no meaningful measurement but rather the categories of interest.

–For example,

and

The variable X1 indicates a nominal variable describing “treatment group” (either Treatment A or not Treatment A) and X2 indicates a nominal variable describing “sex.”

The following simple rule always is applied to avoid collinearity and the imposition of a monotonic dose-response in the regression model: For an exposure with K distinct levels, one level is first chosen as the baseline or reference group. Refer to that level as Level 0, with other K − 1 levels referred to as Level 1, Level 2, and so on up to Level K − 1. Then, define K − 1 binary exposure variables as follows:

For example, in his book Statistics for Epidemiology, Nicholas Jewell notes that dummy variables are used for a variety of measures of variables in the Western Collaborative Group Study of risk factors for coronary heart disease in men:

Let Wt = Body weight (1b), on continuous scale, and choose the baseline for weight Wt ≤150. Then, define the following dummy variables:

and

Another example shows how to use dummy variables to compare two straight-line regression equations: 40 males and 30 females are randomly selected to study the association of systolic blood pressure and age. The data set is presented in Table 1.

Table 1 Systolic Blood Pressure (SBP) by Age and by Sex

A first-order regression model with an added interaction term for this example is

where

This single multiple regression model yields the following two models for the two values of X2:

For the data set, the least-squares regression equation is

For males, the least-squares regression equation is

For females, the least-squares regression equation is

Further statistical hypotheses may be generated from this model. For instance, we may like to test the null hypothesis that the two regression lines are parallel, which is equivalent to H0: β3 = 0. If β3 = 0, the slope for females equals to the slope for males. The decision from an F test is that there is no statistical basis for believing that two lines are not parallel.

RenjinTu

Further Readings

Jewell, N. P.(2004).Statistics for epidemiology.Boca Raton, FL: CRC Press.
Kleinbaum, D. G.,Kupper, L. L.,Muller, K. E. &Nizam, A.(1998).Applied regression analysis and other multivariable methods (
2nd ed.
, p. 264).Belmont, CA: Duxbury Press.
  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading