Dummy Variables

Brandon LeBeau

doi:10.4135/9781506326139

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Dummy Variables

By: Brandon LeBeau
In:The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation
Chapter DOI:https://doi.org/10.4135/9781506326139.n212
Subject:Education

Request Permissions

Show page numbers Hide page numbers

Dummy variables, sometimes referred to as indicator variables, are a common data preparation step to represent categorical (or qualitative) variables as a series of dichotomous (i.e., 0/1) variables. This technique is useful to recreate an analysis of variance model in a regression framework, which is achieved by creating c − 1 new dichotomous variables from a categorical variable, where c represents the number of groups, categories, or levels of the original categorical variable. For example, a variable representing a high school graduated student (i.e., graduated vs. did not graduate) was created by assigning a value of 1 if the student graduated from high school or 0 otherwise. This entry explores in more detail the creation, interpretation, and reasons for using dummy variables.

Creating Dummy Variables

Creating dummy variables is an important data preparation step that is mostly used for fitting linear regression models; however, it is also useful for graphical or tabular displays. When creating dummy variables for tables or figures, it is helpful to create dummy variables for all the categories of the original variable.

For example, suppose the grade level of eight students were collected. This variable could be represented as the grade level each student is currently in (as shown in the left most column of the matrix shown in Figure 1). These eight students were in Grades 7, 8, or 9. The grade level of a student is represented by an integer; however, you could argue that the variable is only ordinal in nature. This suggests that the differences between the values on the scale are not consistent; for example, the difference (growth) between seventh and eighth grade is not the same as between eighth and ninth grade. In these situations, dummy variables offer an alternate representation of the data.

Figure 1 Matrix of dummy variables, showing categorical variables

The dummy variables created from the grade level of the eight students are shown in the right matrix of Figure 1, labeled as Grade7, Grade8, and Grade9, representing three variables for Grades 7, 8, and 9, respectively. As can be seen in Figure 1, to create the Grade7 variable, any students who were recorded to be in Grade 7 in the left side of the equation are now represented by a value of 1 in the right side of the equation, whereas any other grade is represented with a 0. Similar logic was used to create the Grade8 and Grade9 variables. Dummy variables are also referred to as indicator variables, as these new variables in the right matrix indicate the group (i.e., grade) the student belongs to.

When creating dummy variables for regression models, only c − 1 new dichotomous variables are needed when an intercept is included in the linear model. (In fact, because of the perfect relationship created using all categories as predictors, software will have difficulty calculating the statistics). This is shown in Figure 2, where [Page 552]only the variables Grade7 and Grade8 were created. The dummy variable not coded (i.e., Grade9 from the previous matrix) is referred to as the reference group. From a mathematical perspective, it does not matter which level of the categorical variable is used as the reference group; instead, the decision regarding the level to be used as the reference group is driven by the research question of interest. More information on this will be given in the following section on interpreting dummy variables.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Dummy Variables

Creating Dummy Variables

Figure 1 Matrix of dummy variables, showing categorical variables

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends