Skip to main content icon/video/no-internet

Multiple Linear Regression

Multiple linear regression is an extension of simple linear regression in which values on an outcome (Y) variable are predicted from two or more predictor (X) variables. There are three principal objectives of multiple linear regression: (1) to obtain specific predicted values on Y corresponding to specific observed values on the X variables; (2) to determine how well a predetermined set of X variables predict values on Y (i.e., to gauge the predictive strength of this set of predictors, taken together); and (3) to select from a group of X variables a subset that are the “best” predictors of Y. This entry reviews the form of the multiple regression model, assumptions of the analysis, and how to go about selecting and validating a model.

Form of the Multiple Regression Model

The form of the regression model, in the case where there are, for example, three predictor variables, is given by the following equation:

Y^=α+β1X1+β2X2+β3X3.

Here, Y^ is the predicted value of Y, α is the intercept, and β is the slope coefficient. The intercept is a constant that represents the predicted value of Y when each of the X variables has the value 0 (this parameter is not normally of substantive interest). The slope coefficient, which may be positive or negative, is the change in the predicted value of Y for a 1-unit increase in the X variable concerned. Alternatively, the equation can be represented as

Y=α+β1X1+β2X2+β3X3+ε.

Here, Y is the observed value of the outcome variable and ε is the residual—the difference between the observed and the predicted value of the outcome variable (YŶ). The residuals will reflect measurement error and the influence of all potential predictors of Y not included in the model.

As an example of a multiple regression model, assume that students’ examination scores (on a 0–100 scale) are to be predicted from a scale representing their attitudes to schooling (0–30, higher scores more positive), their age (in months), and their sex (0 = male, 1 = female). The intercept for this model is 131.85 and the regression coefficients for β1, β2, and β3 (the attitude scale, age, and sex, respectively) are 0.23, –0.45, and 0.19. For a female student aged 126 months with an attitude score of 22, the predicted exam score would therefore be 131.85 + (0.23 × 22) + (−0.45 × 126) + (0.19 × 1) = 80.40. When there is more than one predictor variable in a regression model, each slope coefficient is adjusted for the others; hence, if age is removed from the model, the coefficient for the attitude scale changes to 0.26 and that for sex to –0.07. The regression coefficient in multiple regression is therefore not simply “the change in Ŷ for a 1-unit change in X,” but “the change in Ŷ for a 1-unit change in X, with the other X variables held constant.”

Because regression coefficients are often expressed in terms of different scales, they cannot be compared for their magnitude. So, a coefficient given in terms of points on a 0–30 scale cannot meaningfully be compared with one given in terms of months; this becomes clear when we consider that if age had been given in years, the coefficient would increase 12-fold, but its predictive strength would be the same. However, statistical software output normally includes standardized coefficients, which are expressed in standard deviation units; the coefficient represents the change in Ŷ in standard deviation units for a 1 standard deviation increase in X. As these standardized coefficients are on the same scale, their relative magnitude can be assessed within a model (though their comparison across models is not recommended).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading