Skip to main content icon/video/no-internet

The multiple correlation arises in the context of MULTIPLE REGRESSION ANALYSIS; it is a one-number summary measure of the accuracy of prediction from the regression model.

In multiple regression analysis, a single dependent variable Y (or criterion) is predicted from a set of independent variables (or predictors). In the most common form of multiple regression, multiple LINEAR REGRESSION (or ORDINARY LEAST SQUARES regression analysis), the independent variables are aggregated into a linear combination according to the following linear regression equation:

None

The Xs are the predictors. Each predictor is multiplied by a weight, called the PARTIAL REGRESSION COEFFICIENT, b1, b2,…, bp. Then, according to the regression equation, the linear combination (or weighted sum) of the scores on the set of predictors is computed. This weighted sum, noted ^Y, is termed the predicted score. The regression coefficients b1, b2,…, bp are chosen in such a way that correlation between the actual dependent variable Y and the predicted score ^Y is as large as possible. This maximum correlation between a single criterion score Y and a linear combination of a set of Xs is the multiple correlation, RY^Y. In usual practice, the square of this correlation is reported, referred to as the squared multiple correlation R2Y^Y, or R-SQUARED.

The squared multiple correlation is a central measure in multiple regression analysis—it summarizes the overall adequacy of the set of predictors in accounting for the dependent variable. The squared multiple correlation is the proportion of variation in the criterion that is accounted for by the set of predictors.

As an example, consider an undergraduate statistics course in which three tests are given during the semester. Suppose in a class of 50 students, we predict scores on Test 3 from scores on Tests 1 and 2 using linear regression as the method of analysis. The resulting linear regression equation is as follows:

None

For each student, we substitute the scores on Test 1 and Test 2 into the regression equation and compute the predicted score ^Y on Test 3. We then correlate these predicted scores with actual scores on Test 3; the resulting correlation is the multiple correlation. Here the multiple correlation is RY^Y = .65, quite a substantial correlation. Students' performance on the third test is closely related to performance on the first two tests. The squared multiple correlation R2Y^Y =.42. We would describe this result by saying that 42% of the variation in the observed Test 3 scores is accounted for by scores on Test 1 and Test 2.

The multiple correlation ranges between 0 and 1. As predictors are added to the regression equation, the multiple correlation either remains the same or increases. The multiple correlation does not take into account the number of predictors. Moreover, the sample multiple correlation tends to overestimate the population multiple correlation. An adjusted squared multiple correlation that is less biased (though not unbiased) is given as follows, where n is the number of cases and p is the number of predictors:

None

For our analysis with n = 50 students and p = 2 predictors, the adjusted R2Y^Y = .39.

...

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading