Multiple Imputation


Missing values are a common problem in almost any data, whether collected through surveys, clinical trials, or epidemiological studies. This can lead to biased results if the missingness is not taken into account at the analysis stage. Multiple imputation is widely accepted as the most convenient strategy for dealing with item nonresponse in a proper way. With multiple imputation, missing values are imputed (i.e., replaced with plausible values given the observed data) more than once. The multiple copies allow accounting for the extra uncertainty from nonresponse using simple formulae (Rubin’s combining rules) ensuring valid inferences based on the imputed data. This extra uncertainty is typically ignored with single imputation, resulting in estimated standard errors and confidence intervals that are too small and p values that are too significant.

Following a general introduction, this entry starts by discussing the requirements for inference based on partially observed data. The inferential procedures for analyzing multiply imputed datasets are presented next, before illustrating the two main approaches for generating multiply imputed datasets: joint modeling and sequential regression. Various parametric and nonparametric imputation strategies are then discussed, followed by a simulation study, which illustrates how multiple imputation would be implemented in practice. Next, the entry discusses practical considerations, such as deciding which variables to include in the imputation models or picking the number of imputations. The entry concludes with a critical review of the limitations of multiple imputation, a discussion of potential alternatives, and an illustration of applications of the multiple framework beyond the nonresponse context.

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles