Multiple Linear Regression

Julius Sim

doi:10.4135/9781506326139

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Multiple Linear Regression

By: Julius Sim
In:The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation
Chapter DOI:https://doi.org/10.4135/9781506326139.n453
Subject:Education

Request Permissions

Show page numbers Hide page numbers

Multiple linear regression is an extension of simple linear regression in which values on an outcome (Y) variable are predicted from two or more predictor (X) variables. There are three principal objectives of multiple linear regression: (1) to obtain specific predicted values on Y corresponding to specific observed values on the X variables; (2) to determine how well a predetermined set of X variables predict values on Y (i.e., to gauge the predictive strength of this set of predictors, taken together); and (3) to select from a group of X variables a subset that are the “best” predictors of Y. This entry reviews the form of the multiple regression model, assumptions of the analysis, and how to go about selecting and validating a model.

Form of the Multiple Regression Model

The form of the regression model, in the case where there are, for example, three predictor variables, is given by the following equation:

$\hat{Y} = α + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} .$

Here, $\hat{Y}$ is the predicted value of Y, α is the intercept, and β is the slope coefficient. The intercept is a constant that represents the predicted value of Y when each of the X variables has the value 0 (this parameter is not normally of substantive interest). The slope coefficient, which may be positive or negative, is the change in the predicted value of Y for a 1-unit increase in the X variable concerned. Alternatively, the equation can be represented as

$Y = α + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + ε .$

Here, Y is the observed value of the outcome variable and ε is the residual—the difference between the observed and the predicted value of the outcome variable (Y − Ŷ). The residuals will reflect measurement error and the influence of all potential predictors of Y not included in the model.

[Page 1111]As an example of a multiple regression model, assume that students’ examination scores (on a 0–100 scale) are to be predicted from a scale representing their attitudes to schooling (0–30, higher scores more positive), their age (in months), and their sex (0 = male, 1 = female). The intercept for this model is 131.85 and the regression coefficients for β1, β2, and β3 (the attitude scale, age, and sex, respectively) are 0.23, –0.45, and 0.19. For a female student aged 126 months with an attitude score of 22, the predicted exam score would therefore be 131.85 + (0.23 × 22) + (−0.45 × 126) + (0.19 × 1) = 80.40. When there is more than one predictor variable in a regression model, each slope coefficient is adjusted for the others; hence, if age is removed from the model, the coefficient for the attitude scale changes to 0.26 and that for sex to –0.07. The regression coefficient in multiple regression is therefore not simply “the change in Ŷ for a 1-unit change in X,” but “the change in Ŷ for a 1-unit change in X, with the other X variables held constant.”

Because regression coefficients are often expressed in terms of different scales, they cannot be compared for their magnitude. So, a coefficient given in terms of points on a 0–30 scale cannot meaningfully be compared with one given in terms of months; this becomes clear when we consider that if age had been given in years, the coefficient would increase 12-fold, but its predictive strength would be the same. However, statistical software output normally includes standardized coefficients, which are expressed in standard deviation units; the coefficient represents the change in Ŷ in standard deviation units for a 1 standard deviation increase in X. As these standardized coefficients are on the same scale, their relative magnitude can be assessed within a model (though their comparison across models is not recommended).

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Multiple Linear Regression

Form of the Multiple Regression Model

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends