Missing Data, Imputation of

Randy Bartlett

doi:10.4135/9781412961288

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Missing Data, Imputation of

By: Randy Bartlett
In:Encyclopedia of Research Design
Chapter DOI:https://doi.org/10.4135/9781412961288.n243
Subject:Anthropology, Business and Management, Criminology and Criminal Justice, Communication and Media Studies, Counseling and Psychotherapy, Economics, Education, Geography, Health, History, Marketing, Nursing, Political Science and International Relations, Psychology, Social Policy and Public Policy, Social Work, Sociology, Technology, Medicine

Request Permissions

Show page numbers Hide page numbers

Imputation involves replacing missing values, or missings, with an estimated value. In a sense, imputation is a prediction solution. It is one of three options for handling missing data. The general principle is to delete when the data are expendable, impute when the data are precious, and segment for the less common situation in which a large data set has a large fissure. Imputation is measured against deletion; it is advantageous when it affords the more accurate data analysis of the two. This entry discusses the differences between imputing and deleting, the types of missings, the criteria for preferring imputation, and various imputation techniques. It closes with application suggestions.

Figure 1 Missing Data Structure

Impute or Delete

The trade-off is between inconvenience and bias. There are two choices for deletion (casewise or pairwise) and several approaches to imputation. Casewise deletion omits entire observations (or cases) with a missing value from all calculations. Pairwise deletion omits observations on a variable-by-variable basis. Casewise deletion sacrifices partial information either for convenience or to accommodate certain statistical techniques. Techniques such as structural equation modeling may require complete data for all the variables, so only casewise deletion is possible for them. For techniques such as calculating correlation coefficients, pairwise deletion will leverage the partial information of the observations, which can be advantageous when one is working with small sample sizes and when missings are not random.

Imputation is the more advantageous technique when (a) the missings are not random, (b) the missings represent a large proportion of the data set, or (c) the data set is small or otherwise [Page 805]precious. If the missings do not occur at random, which is the most common situation, then deleting can create significant bias. For some situations, it is possible to repair the bias through weighting—as in poststratification for surveys. If the data set is small or otherwise precious, then deleting can severely reduce the statistical power or value of the data analysis.

Imputation can repair the missing data by creating one or more versions of how the data set should appear. By leveraging external knowledge, good technique, or both, it is possible to reduce bias due to missing values. Some techniques offer a quick improvement over deletion. Software is making these techniques faster and sharper; however, the techniques should be conducted by those with appropriate training.

Categorizing Missingness

Missingness can be categorized in two ways: the physical structure of the missings and the underlying nature of the missingness. First, the structure of the missings can be due to item or unit missingness, the merging of structurally different data sets, or barriers attributable to the data collection tools. Item missingness refers to the situation in which a single value is missing for a particular observation, and unit missingness refers to the situation in which all the values for an observation are missing. Figure 1 provides an illustration of missingness.

Table 1 Underlying Nature of Missingness

Second, missings can be categorized by the underlying nature of the missingness. These three categories are (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR), summarized in Table 1 and discussed below.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Missing Data, Imputation of

Figure 1 Missing Data Structure

Impute or Delete

Categorizing Missingness

Table 1 Underlying Nature of Missingness

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends