Principal Components Analysis

Abstract

Principal component analysis (PCA) is a technique that essentially converts observed correlated variables into unobserved uncorrelated components. This enables a data set containing many individual variables to be described using a small number of components that capture much of the variation in the data set. PCA has a long history in statistics and has been applied in many disciplines including biology, astronomy, geography, social sciences, meteorology and management. In addition to reducing the number of variables required to describe a data set, PCA can also identify underlying mechanisms that may have played a role in determining the structure in the data (i.e., the underlying “causes”). The reduction of a large number of variables to a relatively small number of components also enables a data set to be more easily analysed and described using other techniques. In particular, as the components identified by PCA are uncorrelated, many of the problems associated with multicollinearity are alleviated, enabling regression models to be more easily interpreted. This entry provides a relatively nontechnical and practical introduction to the application of PCA using a readily available data set and open-source software.

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles