In this guide you will learn how to estimate a multiple regression model in IBM® SPSS® Statistics software (SPSS) using a practical example to illustrate the process. Readers are provided links to the example dataset and encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have already opened the data file in SPSS.
Multiple regression expresses a dependent, or response, variable as a linear function of two or more independent variables. This requires estimating an intercept (often called a constant) and a slope for each independent variable that describes the expected value of the dependent variable at any particular value of each independent variable. Most attention is typically focused on the slope estimates because they capture the relationship between the dependent and independent variables. The dependent variables should be continuous. This example will focus on using independent variables that are also continuous, though the model can also accommodate categorical independent variables (see Regression with dummy variables).
This example uses three variables from the IAT2012 dataset:
The implicit attitudes scale has a range of −1.9 to 1.8. Its mean is 0.32. The self-reported attitudes to African Americans scale runs from 0 to 10. Its mean is 7.2. politicalid_7 is a scale that runs from 1 to 7, with a mean of 4.4. tblack and politicalid_7 are formally ordinal variables while impwhitegood is interval ratio. In accordance with common practice in applied research, we treat all variables as continuous interval level.
When conducting a multiple regression, it is often wise to examine each variable in isolation first. We start by presenting histograms of self-reported attitudes to African Americans, implicit attitudes and political ideology. This is done in SPSS by selecting from the Menu:
Analyze → Descriptive Statistics → Explore
In the Explore dialog box that opens, move the impwhitegood, politicalid_7 and tblack variables into the Dependent List: box. On the right side of the Explore dialog box, click the “Plots” button. This opens another dialog box where you can select the plots you want to produce. For this example, just check “Histogram” under the Descriptive heading. Then click Continue and OK to perform the analysis.
Screen shots for the procedure to produce histograms in SPSS are available in the How to Guides for the Dispersion of a Continuous Variables topic that is part of SAGE Research Methods Datasets.
You estimate a multiple regression model in SPSS by selecting from the Menu:
Analyze → Regression → Linear
In the Linear Regression dialog box that opens, you move the tblack variable into the Dependent: window and you move the politicalid_7 and impwhitegood variables into the Independent(s): window. Once you are done, click OK to perform the analysis. Figure 1 shows what this looks like in SPSS.
Figure 2 shows that impwhitegood is normally distributed with a slight negative skew. This makes it appropriate as an independent variable in simple regression analysis.
Figure 3 shows that the mean score on tblack is approximately normally distributed but negatively skewed. OLS regression is relatively robust to violations of the assumption of normality, and the variable looks suitable for inclusion in the analysis.
Figure 4 shows that the mean score on the political ideology scale is just under 4.5. There are only 7 discreet values possible in the distribution, based on the response options available, but OLS regression is relatively robust to violations of the assumption of normality, and in fact the distribution looks approximately normal.
Figure 5 presents four tables of results that are produced by the simple linear regression procedure in SPSS. The fourth table shows the results of primary interest.
The first three tables in Figure 5 report the independent variables entered into the model, some summary fit statistics for the regression model, and an analysis of variance for the model as a whole. While detailed examination of these tables is beyond the scope of this example, we note that in the second table, R Square measures the proportion of the variance in the dependent variable explained by the model. An R Square of .067 means that only about 6.7 percent of the variance in attitudes is accounted for by political ideology and implicit attitudes.
The fourth table in Figure 5 presents an estimate of the intercept (or constant) as equal to approximately 6.9. The constant of a simple regression model can be interpreted as the average expected value of the dependent variable when all of the independent variables equal zero. In this case, the political ideology variable has no zero point, so the constant has no useful interpretation. For this kind of reason, researchers do not often have predictions based on the intercept, so it often receives little attention.
The fourth table in Figure 5 reports that the estimated value for the partial slope coefficient linking political ideology to attitudes to African Americans is estimated to be 0.141 This represents the average marginal effect of each additional year on attitude, and can be interpreted as the expected change in the dependent variable on average for a one-unit increase in the independent variable, controlling for implicit attitudes. It is called a partial coefficient because it represents the unique association with the dependent variable, not that which is shared with the other independent variable(s). For this example, that means that every increase of one point in the ideology scale (i.e. in the more liberal direction) is associated with an increase in attitude score of about 0.141, adjusted for implicit attitudes. This means that those who are more liberal tend to have more favourable self-reported attitudes towards African Americans.
The same table also reports that the partial slope coefficient linking implicit attitude to self-reported attitude is estimated to be −1.15. This represents the average marginal effect of implicit on self-reported attitude, and can be interpreted as the expected change in the dependent variable on average for a one-unit increase in the independent variable, controlling for political ideology. For this example, that means that for every additional one point scored on the implicit attitude scale, their attitude to African Americans is expected to decline by −1.15, controlling for political ideology. That is to say that implicit and self-reported attitudes towards African Americans are associated with each other. The table also reports that both slope estimates are statistically significantly different from zero. This leads us to reject both null hypotheses and conclude that there appear to be relationships with self-reported attitudes for both political ideology and implicit attitudes.
There are multiple diagnostic tests researchers might perform following the estimation of a multiple regression model to evaluate whether the model appears to violate any of the OLS assumptions or whether there are other kinds of problems such as particularly influential cases. Describing all of these diagnostic tests is well beyond the scope of this example.
Download this sample dataset to see if you can replicate these results. See if you can reproduce the results presented here, and try producing your own multiple regression by replacing tblack with twhite as the dependent variable.
IBM® SPSS® Statistics software (SPSS) screenshots Republished Courtesy of International Business Machines Corporation, © International Business Machines Corporation. SPSS Inc. was acquired by IBM in October, 2009. IBM, the IBM logo, ibm.com, and SPSS are trademarks or registered trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “IBM Copyright and trademark information” at http://www.ibm.com/legal/copytrade.shtml.