Entry
Reader's guide
Entries A-Z
Cluster Analysis
Social science DATA sets usually take the form of observations on UNITS OF ANALYSIS for a set of VARIABLES. The goal of cluster analysis is to produce a simple classification of units into subgroups based on information contained in some variables. The vagueness of this statement is not accidental. Although there may be no formal definition of cluster analysis, a slightly more precise statement is possible. The clustering problem requires solutions to the task of establishing clusterings of the n units into r clusters (where r is much smaller than n) so that units in a cluster are similar, whereas units in distinct clusters are dissimilar. Put differently, these clusterings have homogeneous clusters that are well separated. Cluster analysis is a label for the diverse set of tools for solving the clustering problem (see Everitt, Landau, & Leese, 2001). Most often, these tools are used for inductive explorations of data. The hope is that the clusterings provide insight into the structure of the data, the nature of the units, and the processes generating the variables. For example, cities can be clustered in terms of their social, economic, and demographic characteristics. People can be clustered in terms of their psychological profiles or other attributes they possess.
Development of Cluster Analysis
Prior to 1960, many clustering problems were solved separately in different disciplines. Progress was fragmented. The early 1960s saw attempts to provide general treatments of cluster analysis, given these many developments. Sokal and Sneath (1963) provided an extensive discussion and helped set the framework for the development of cluster analysis as a data-analytic field. Specifying clustering problems is not difficult. Nor are the mathematical foundations for expressing and creating most solutions to the clustering problem. The difficulty of cluster analysis comes from the computational complexities in establishing solutions to the clustering problem. As a result, the field has been driven primarily by the evolution of computing technology. Generally, this has been beneficial, with substantive interpretations being enriched by useful clusterings. In addition, many technical developments have stemmed from exploring substantive applications in new domains. There are now many national societies of cluster analysts that are linked through the International Federation of Classification Societies.
Solving Clustering Problems
In general, the clustering problem can be stated as establishing one (or more) clustering(s) with r clusters that have the minimized value of a well-defined criterion function over all feasible clusterings. The criterion function provides a measure of fit for all clusterings. In practice, however, the criterion function often is left implicit or ignored. In most applications, the clustering is a partition, but “fuzzy clustering” with overlapping clusters is possible. Once the units of analysis have been selected, there are five broad steps in conducting cluster analyses:
- measuring the relevant variables (both QUANTITATIVE VARIABLES and CATEGORICAL VARIABLES can be included, and some form of standardization may be necessary),
- creating a (dis)similarity MATRIX for an appropriate measure of (dis)similarity,
- creating one or more clusterings via a clustering algorithm,
- providing some assessment of the obtained clustering(s), and
- interpreting the clustering(s) in substantive terms.
Although all steps are fraught with hazard, Steps 2 and 3 are the most hazardous, and Step 4 is ignored often. In Step 2, dissimilarity measures (e.g., Euclidean, Manhattan, and Minkowsky distances) or similarity measures (e.g., CORRELATION and matching COEFFICIENTS) can be used. The choice of a measure is critical: Different measures can lead to different clusterings. In Step 3, there are many ALGORITHMS for establishing clusterings. Each pair of choices (of measures and algorithms), in principle, can lead to different clusterings of the units.
...
- Analysis of Variance
- Association and Correlation
- Association
- Association Model
- Asymmetric Measures
- Biserial Correlation
- Canonical Correlation Analysis
- Correlation
- Correspondence Analysis
- Intraclass Correlation
- Multiple Correlation
- Part Correlation
- Partial Correlation
- Pearson's Correlation Coefficient
- Semipartial Correlation
- Simple Correlation (Regression)
- Spearman Correlation Coefficient
- Strength of Association
- Symmetric Measures
- Basic Qualitative Research
- Basic Statistics
- F Ratio
- N(n)
- t-Test
- X¯
- Y Variable
- z-Test
- Alternative Hypothesis
- Average
- Bar Graph
- Bell-Shaped Curve
- Bimodal
- Case
- Causal Modeling
- Cell
- Covariance
- Cumulative Frequency Polygon
- Data
- Dependent Variable
- Dispersion
- Exploratory Data Analysis
- Frequency Distribution
- Histogram
- Hypothesis
- Independent Variable
- Measures of Central Tendency
- Median
- Null Hypothesis
- Pie Chart
- Regression
- Standard Deviation
- Statistic
- Causal Modeling
- DISCOURSE/CONVERSATION ANALYSIS
- Econometrics
- Epistemology
- Ethnography
- Evaluation
- Event History Analysis
- Experimental Design
- Factor Analysis and Related Techniques
- Feminist Methodology
- Generalized Linear Models
- HISTORICAL/COMPARATIVE
- Interviewing in Qualitative Research
- Latent Variable Model
- LIFE HISTORY/BIOGRAPHY
- LOG-LINEAR MODELS (CATEGORICAL DEPENDENT VARIABLES)
- Longitudinal Analysis
- Mathematics and Formal Models
- Measurement Level
- Measurement Testing and Classification
- Multilevel Analysis
- Multiple Regression
- Qualitative Data Analysis
- Sampling in Qualitative Research
- Sampling in Surveys
- Scaling
- Significance Testing
- Simple Regression
- Survey Design
- Time Series
- ARIMA
- Box-Jenkins Modeling
- Cointegration
- Detrending
- Durbin-Watson Statistic
- Error Correction Models
- Forecasting
- Granger Causality
- Interrupted Time-Series Design
- Intervention Analysis
- Lag Structure
- Moving Average
- Periodicity
- Serial Correlation
- Spectral Analysis
- Time-Series Cross-Section (TSCS) Models
- Time-Series Data (Analysis/Design)
- Trend Analysis
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches