Skip to main content icon/video/no-internet

An important indicator of data quality is the fraction of missing data. Missing data (also called "item non-response") means that for some reason data on particular items or questions are not available for analysis. In practice, many researchers tend to solve this problem by restricting the analysis to complete cases through "listwise" deletion of all cases with missing data on the variables of interest. However, this results in loss of information, and therefore estimates will be less efficient. Furthermore, there is the possibility of systematic differences between units that respond to a particular question and those that do not respond—that is, item nonresponse error. If this is the case, the basic assumptions necessary for analyzing only complete cases are not met, and the analysis results may be severely biased.

Modern strategies to cope with missing data are imputation and direct estimation. Imputation replaces the missing values with plausible estimates to make the data set complete. Direct estimation means that all available (incomplete) data are analyzed using a maximum likelihood approach. The increasing availability of user-friendly software will undoubtedly stimulate the use of both imputation and direct estimation techniques.

However, a prerequisite for the statistical treatment of missing data is to understand why the data are missing. For instance, a missing value originating from accidentally skipping a question differs from a missing value originating from reluctance of a respondent to reveal sensitive information. Finally, the information that is missing can never be replaced. Thus, the first goal in dealing with missing data is to have none. Prevention is an important step in dealing with missing data. Reduction of item nonresponse will lead to more information in a data set, to more data to investigate patterns of the remaining item nonresponse and select the best corrective treatment, and finally to more data on which to base imputation and a correct analysis.

A Typology Of Missing Data

There are several types of missing data patterns, and each pattern can be caused by different factors. The first concern is the randomness or nonrandomness of the missing data.

Missing At Random Or Not Missing At Random

A basic distinction is that data are (a) missing completely at random (MCAR), (b) missing at random (MAR), or (c) not missing at random (NMAR). This distinction is important because it refers to quite different processes that require different strategies in data analysis.

Data are MCAR if the missingness of a variable is unrelated to its unknown value and also unrelated to the values of all other variables. An example is inadvertently skipping a question in a questionnaire. When data are missing completely at random, the missing values are a random sample of all values and are not related to any observed or unobserved variable. Thus, results of data analyses will not be biased, because there are no systematic differences between respondents and nonrespondents, and problems that arise are mainly a matter of reduced statistical power. It should be noted that the standard solutions in many statistical packages, those of listwise and pairwise deletion, both assume that the data are MCAR. However, this is a strong and often unrealistic assumption.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading