Cluster Analysis


Cluster analysis is a flexible, exploratory, person-centered technique that groups cases (often individuals) into clusters. Cases in the same cluster are more similar or closer in geometric space than cases in different clusters. This flexibility allows for the identification of complex combinations of characteristics and the exploration of individual differences on a small or large number of variables simultaneously. This entry discusses uses of cluster analysis in the social sciences as well as best practices in conducting cluster analysis based on previous research. Many cluster analysis methods exist. The most commonly used methods in social science research are hierarchical agglomerative cluster analysis and k-means cluster analysis. Hierarchical agglomerative cluster analysis tends to be used to narrow the possible number of clusters considered for the final cluster solution, whereas k-means is often used to identify the final cluster solution. In running cluster analysis, investigators must make several critical decisions including (a) which variables to include in the analysis, (b) which types of cluster analysis to use, (c) whether and how to standardize variables, (d) which distance metric to use, (e) linkage method for hierarchical methods and starting values for partitional methods, and finally, (f) how many clusters to retain in the final solution. Best practices for each step are reviewed and examples are given whenever possible. Although cluster analysis is criticized as an exploratory, data-driven technique, it can be useful for confirming or refuting theory, and steps can be taken to support the validation and generalizability of the final cluster solution.

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles