Skip to main content icon/video/no-internet

Correspondence Analysis

Correspondence analysis (CA) is a quantitative data analysis method that offers researchers a visual understanding of relationships between qualitative (i.e., categorical) variables. Even though CA closely relates to the chi-square statistic (χ²), it is not an inferential method for directly testing theory and hypotheses. Instead, CA is a descriptive data reduction technique, similar to principal components analysis (PCA). Performing a CA using computer software offers researchers an easy way to interpret graphic representation of cross-tabulated data appearing in contingency tables. As widely used statistical methods seldom consider relationships between categorical variables, many such relationships go unnoticed in datasets. Although CA is a descriptive method, identification of any such previously unnoticed relationships can lead to future hypothesis testing. This entry provides background on the history of CA, and identifies key concepts in CA such as profile, mass, centroid, chi-square distance, and inertia.

Analysis of Categorical Data

Although CA is sometimes described as a relatively new approach to multivariate statistics, methods of analyzing categorical variables similar to CA were identified even before the 20th century. Prior to advancements in computer processing power, however, applications of CA were limited. Aside from a few scattered documented uses of CA in the first half of the 20th century, few researchers utilized CA as a statistical method until the 1960s and 1970s. During that time frame, CA became of interest to researchers and statisticians in Europe, particularly in France, with Jean-Paul Benzécri and Michael Greenacre primarily credited for popularizing the approach. By the end of the 20th century, researchers in disciplines ranging from the social sciences to medicine had applied CA in their research.

CA is similar in several ways to PCA. Much like PCA, the purpose of CA is to simplify very complex data, existing in several dimensions, to fewer dimensions. In CA, data is typically reduced to two dimensions so that each dimension is fairly easy to interpret. Also like PCA, CA uses singular value decomposition (SVD) and eigenvalues to discern how much variation in the data is explained by each dimension. Researchers then examine patterns in the different dimensions to determine what the dimension describes. Unlike PCA, which is conducted on normally distributed continuous variables, CA is conducted on categorical variables, which are inherently not normally distributed. However, the values in a CA must all use the same scale (e.g., counts or frequencies).

A simple correspondence analysis can be used with a contingency table of two categorical variables. A more advanced technique, referred to as multiple correspondence analysis (MCA), can be used to examine several categorical variables. Contingency tables compare rows and columns of data. The χ² statistic can then be used to compare the observed cell values with expected values in a contingency table. Though the χ² statistic can be used to determine whether a statistically significant relationship exists between categorical variables, χ² does not provide details into the nature of the relationship. CA can offer insight into the relationship between variables by displaying, on a map, which variables tend to appear together.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading