Data Cleaning

Abstract

Data cleaning is the process of quality checking quantitative data to ensure a data set contains accurate information. Data cleaning involves a number of practical approaches to dealing with data such as checking data coding, checking data inputting, examining data distributions, and identifying issues such as extreme values. Data cleaning may be an important step of the research process in order to meet statistical assumptions for analytic techniques and is particularly important to reduce the impact of any errors made during data collection or imputation. This entry provides a detailed overview of data cleaning processes and techniques to support accurate and reliable data analysis. This includes screening data, dealing with extreme values, dealing with missing or incomplete data, and data distributions. Data cleaning is not an objective exercise and subjective decisions may need to be made during the data cleaning process. It is imperative that researchers are transparent about data cleaning processes and decisions.

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles