Skip to main content icon/video/no-internet

In many, if not most, studies, some data that were meant to be collected are missing. For example, in a survey, some people may not respond to all the questions. Or, in a randomized experiment, some units' outcomes may not be measured because of equipment failure. Multiple imputation is one principled method for handling such missing data. The general idea is to fill in the missing data with plausible values, analyze the completed data set, and repeat the process multiple times. The analyses from each completed data set are combined to result in inferences that properly account for the missing data. Multiple imputation has been used in large governmental surveys such as the National Health and Nutrition Examination Survey and the Survey of Consumer Finances, and in numerous studies by individual researchers.

Before we review multiple imputation, it is worth-while to consider the most common and convenient approach to handling missing data: Analyze only the cases that have complete data for the variables of interest. This available cases approach can lead to inaccurate estimates. For a simple illustration of this point, consider the hypothetical data in Table 1 for a random sample of five people. Suppose that weights of all people over 6 feet tall are missing—so that the observed data are 130, 140, and 150—because the height/weight instrument is unable to record information for people over 6 feet tall. Researchers interested in estimating the population average weight are in trouble if they use only the three available cases: their sample average is a severe underestimate.

Table 1 Hypothetical Data for Illustrating Multiple Imputation
Height (inches) Weight (pounds)
65 130
68 140
70 150
72 160
75 170

Many times, researchers are interested in relationships among variables, such as regression coefficients. In the hypothetical example, the fitted regression of weight on height obtained using the three available cases results in reasonable (unbiased) estimates of the slope and intercept, because the regression holds for all heights. However, in data sets with many variables and complicated missing data patterns, using only the available cases might exclude a large fraction of the observations, which could dramatically increase the variability of the estimates. Additionally, different specifications of models may use different units for estimation, making theoretical properties of resulting inferences nearly impossible to understand and practical comparisons of different models difficult.

Illustration of Multiple Imputation

In contrast to available cases analyses, multiple imputation uses all records for estimation, which takes advantage of the information from partially completed records. To illustrate multiple imputation, we again use the hypothetical example. We first demonstrate how to analyze a set of multiply imputed data sets, and then discuss methods of generating imputations.

Suppose that five plausible values for each missing weight have been generated to create five completed data sets. These are displayed in Table 2, along with the estimated slope and its variance obtained from fitting standard linear regression in each completed data set. Inferences for the population regression slope β are based on three quantities. First, compute the average of the five estimated slopes, which equals 4.01. Second, compute the variance of these five estimated slopes, which equals .0523. Third, compute the average of the variances of the slopes, which equals .0209. The point estimate of the population slope is 4.01, and the variance associated with this point estimate is (1 + 1/5)(0.0523) + 0.0209 = 0.0836. An approximate 95% confidence interval for β is 4.01 ±1.96√.0836.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading