Skip to main content icon/video/no-internet

Dimension reduction is a collection of statistical methodologies that reduces the dimension of the data while still preserving relevant information. High-dimensional data are very common in government agencies, academia, and industrials. However, the high dimension and large volume of data bring up at least two issues, among many others. One is to overcome the curse of dimensionality, which states that high-dimensional spaces are inherently sparse even with large number of observations. The other is how to present the information within data parsimoniously. Dimension reduction techniques address these issues to varying extents by reducing the set of variables to a smaller set of either the original variables or new variables, where the new variables are linear combinations or even nonlinear functions of the original ones. When the new dimension is relatively small, data visualization becomes possible, which often assists data modeling substantially.

Dimension Reduction Methodologies

Based on whether a response is specified or not, dimension reduction techniques generally can be divided into two major categories: supervised dimension reduction and unsupervised dimension reduction.

Unsupervised Dimension Reduction

Unsupervised dimension reduction treats all variables equally without specifying a response. The analysis usually has a natural definition about the information of interest. Unsupervised dimension reduction methods find a new set of a smaller number of variables that either provides a simpler presentation or discovers intrinsic structure in the data while retaining most of the important information. Listed below are only a few of the most widely used techniques.

Principal component analysis (PCA) finds a few orthogonal linear combinations of the original variables with the largest variances; these linear combinations are the principal components that would be retained for subsequent analyses. In PCA, the information is the variation within the data. Usually, principal components are sorted in descending order according to their variations. The number of principal components that should be included in the analysis depends on how much variation should be preserved.

Factor analysis assumes that a set of variables establishes the relationships among themselves through a smaller set of common factors. It estimates the common factors with assumptions about the variance-covariance structure.

Canonical correlation analysis identifies and measures the association between two sets of random variables. Often, it finds one linear combination of variables for each set, where these two new variables have the largest correlation.

Correspondence analysis is a graphical tool for an exploratory data analysis of a contingency table. It projects the rows and columns as points into a plot, where rows (columns) have a similar profile if their corresponding points are close together.

Projection pursuit defines a projection index that measures the “interestingness” of a direction. Then, it searches for the direction maximizing the index.

Multidimensional scaling finds a projection of the data into a smaller dimensional space so that the distances among the points in the new space reflect the proximities in the original data.

Supervised Dimension Reduction

Supervised dimension reduction techniques generally are applied in regression. A response Y is specified that can be one random variable, one random vector, or even a curve. The predictor vector X is p-dimensional. The object of interest is the relation between the response and the predictors, which is often summarized as Y = f(X, ∊), where ∊ denotes the error term. Some specific structures are imposed to facilitate the estimation of the function. Dimension reduction is a crucial part of the modeling process. For example, ordinary least squares regression can be considered as a special case of dimension reduction in regression.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading