Skip to main content icon/video/no-internet

Represented by r2 for the bivariate case and R2 in the multivariate case, the coefficient of determination is a measure of GOODNESS OF FIT in ORDINARY LEAST SQUARES LINEAR REGRESSION. (In the multivariate case, it is often called the coefficient of multiple determination.) The statistic measures how well the estimated regression line fits the actual data. It tells us how much of the variation in the dependent variable, y, is explained by the model by all of the independent variables taken together. Specifically, it is a proportion of the explained variation in y to the total variation in y.

A number of equations may be used to calculate the coefficient of determination. The first takes the explained sum of squares and divides it by the total sum of squares: R2 = ESS/TSS. A second subtracts the proportion of the residual sum of squares (the unexplained sum of squares) to the total sum of squares from 1: R2 = 1− (RSS/TSS).

Because it is the proportion of the explained sum of squares over the total sum of squares, the measure must fall between 0 and 1. A value of 1 indicates a perfect fit of the linear regression line to the data; values closer to 0 suggest a poor fit. If the x and y variables are completely linearly independent, the R2 will equal 0. It should be noted that R2 could be a negative value ranging from 0 to –1 if the sample average accounts for more variation in the dependent variables than the independent variables explain. Multiplying the measure by 100 gives a value that allows for clearer interpretation; with this calculation, then, an R2 of .25 would be interpreted by saying that 25% of the variation in the dependent variable is explained by the independent variables in the model.

The square root of the coefficient of determination is known as the coefficient of MULTIPLE CORRELATION, or the sample CORRELATION coefficient in the bivariate case. Conversely, R2 is the square of the coefficient of multiple correlation, which is the measure of correlation between the estimated dependent variable calculated from the independent variables (ŷ^) and the actual dependent variable (y).

Although much consideration, perhaps sometimes too much, is often paid to this measure, there are a number of important points to remember when interpreting it. First, no matter how high your R2 is, it only is evidence of correlation; it does not provide positive support for causation. That is, a high R2 does not allow you to state that your independent variables caused your dependent variable. Second, although, for example, 30% is not an extremely high value, your model may be performing relatively well; explaining 30% of the variation of a factor in the political environment is often still a substantial portion. Also, when conducting time-series analysis, you may often find R2s over .90. Third, as independent variables are added to the model, the R2 will increase, but we should avoid trying to maximize the coefficient of correlation over theoretically sound models. Finally, R2s cannot be compared between any two given models if they do not have the same dependent variable.

...

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading