Exploratory Data Analysis


Exploratory data analysis (EDA), pioneered by J. W. Tukey in the 1960s, emphasises that data analysis itself is a science, distinct from the confirmation or rejection of hypotheses by a statistical test. EDA stresses the importance of understanding the data-generating process that produces the data to be analysed, how that might structure the data in various ways, or give rise to errors within it. Such exploration of the data is fundamental to the generation of hypotheses about it or models to summarise it, as opposed to the later confirmation or rejection of these hypotheses or models. Visualisation is central to EDA both for the initial inspection of data and for the presentation of the results of analysis. Robust or resistant measures of level and spread such as the median or the interquartile range are preferred over mathematically more tractable but sensitive measures such as the arithmetic mean or standard deviation. EDA is often compared to detective work in the discovery of the author of some crime, as opposed to the proceedings of the trial that decides upon guilt or innocence.

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles