How-to Guide for IBM® SPSS® Statistics Software

Introduction

In this guide you will learn how to estimate a simple regression model in IBM® SPSS® Statistical Software (SPSS) using a practical example to illustrate the process. You will find links to the example dataset and you are encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have already opened the data file in SPSS.

Contents

- Simple Regression
- An Example in SPSS: CO2 Emissions and Engine Size in Automobiles
- 2.1 The SPSS Procedure
- 2.2 Exploring the SPSS Output

- Your Turn

1 Simple Regression

Simple regression expresses a dependent, or response, variable as a linear function of an independent variable. This requires estimating an intercept (often called a constant) and a slope that describes the expected value of the dependent variable at any particular value of the independent variable. Most attention is typically focused on the slope estimate because it captures the relationship between the dependent and independent variables. The dependent variable should be continuous. This example will focus on using an independent variable that is also continuous, though the model can also accommodate a categorical independent variable (see Regression with Dummy Variables).

2 An Example in SPSS: CO2 Emissions and Engine Size in Automobiles

This example uses two variables from the 2015 Fuel Consumption Report from Natural Resources Canada:

- The CO2 emissions of an automobile (co2emissions), measured in grams per kilometer.
- The size of the automobile’s engine (enginesize), measured in liters.

The CO2 emissions of the automobile are measured in grams per kilometer. In this dataset, it has a mean of approximately 244.69 with a standard deviation of about 55.70. The size of the automobile’s engine, measured in liters, has a mean of approximately 3.13 with a standard deviation of about 1.27. Both of these variables are continuous, making them appropriate for simple regression.

2.1 The SPSS Procedure

When conducting a simple regression, it is often wise to examine each variable in isolation first. We start by presenting histograms of CO2 emissions and engine size, respectively. This is done in SPSS by selecting from the menu:

Analyze → Descriptive Statistics → Explore

In the Explore dialog box that opens, move the CO2 emissions and engine size variables into the Dependent List: box. On the right side of the Explore dialog box, click the “Plots” button. This opens another dialog box where you can select the plots you want to produce. For this example, just check “Histogram” under the Descriptive heading. Then click Continue and OK to perform the analysis.

Screenshots for the procedure to produce histograms in SPSS are available in the How-to Guide for the Dispersion of Continuous Variables topic.

To estimate a simple regression model in SPSS, select from the menu:

Analyze → Regression → Linear

In the Linear Regression dialog box that opens, move the CO2 emissions variable (co2emissions) into the Dependent: window and move the engine size variable (enginesize) into the Independent(s): window.

Figure 1 shows what this looks like in SPSS.

Figure 1: Selecting simple regression from the Analyze menu in SPSS.

Once you are done, click OK to perform the analysis.

2.2 Exploring the SPSS Output

Figures 2 and 3 present the histograms for CO2 emissions in grams per kilometer and engine size in liters, respectively.

Figure 2: Histogram showing the distribution CO2 emissions in an automobile, 2015 Fuel Consumption Report, Natural Resources Canada.

Figure 2 shows that the majority of values for CO2 emissions fall below 275 grams per kilometer, with the bulk of observations clustered around the mean of about 244 grams per kilometers. There are a few extreme values, with the largest being 437 grams per kilometer. Researchers may want to explore whether cases with these extreme values have undue influence on the regression.

Figure 3: Histogram showing the distribution size of an engine (in liters) in an automobile, 2015 Fuel Consumption Report, Natural Resources Canada.

Figure 3 shows the values for the size of the automobile’s engine. Most of the observations are less than 3.5 liters. There are a few extreme values, with the largest being 6.8 liters. Researchers may want to explore whether cases with these extreme values have undue influence on the regression.

Figure 4 presents four tables of results that are produced by the simple linear regression procedure in SPSS. The fourth table, outlined in red, includes the results of primary interest.

Figure 4: Simple regression of CO2 emissions on engine size, 2015 Fuel Consumption Report, Natural Resources Canada.

The first three tables in Figure 4 report the independent variable(s) entered into the model, some summary fit statistics for the regression model, and an analysis of variance for the model as a whole. While detailed examination of these tables is beyond the scope of this example, we note in the second table that R Square measures the proportion of the variance in the dependent variable explained by the model, which in this case consists of a single independent variable. An R Square of 0.702 means that more than 70% of the variance in CO2 emissions can be explained by the size of the engine.

The fourth table in Figure 4, outlined in red, presents the estimates of the intercept, or constant, and the slope coefficient. It reports an estimate of the intercept (constant) as equal to approximately 130.32. The constant of a simple regression model can be interpreted as the average expected value of the dependent variable when the independent variable equals zero. In this case, our independent variable, enginesize, can never be zero, so the constant by itself does not tell us much. Researchers do not often base predictions on the intercept, so it often receives little attention.

The fourth table in Figure 4 reports that the estimated value for the slope coefficient linking engine size to CO2 emissions is estimated to be approximately 36.59. This represents the average marginal effect of engine size on the amount of CO2 emissions, and can be interpreted as the expected change in the dependent variable on average for a one-unit increase in the independent variable. For this example, that means that every increase of 1 more liter in the size of an engine is associated with an average increase in the amount of CO2 emissions of about 36.59 grams per kilometer. The table also reports that this estimate is statistically significantly different from zero. This leads us to reject the null hypothesis and conclude that there does appear to be a positive relationship between CO2 emissions and engine size.

There are multiple diagnostic tests researchers might perform following the estimation of a simple regression model to evaluate whether the model appears to violate any of the OLS assumptions or whether there are other kinds of problem such as particularly influential cases. Describing all of these diagnostic tests is well beyond the scope of this example.

3 Your Turn

Download this sample dataset and see if you can replicate these results. The sample dataset also includes another variable, called fuelusecity, that measures city fuel consumption in liters per 100 kilometers. See if you can reproduce the results presented here, and try producing your own simple regression by replacing the CO2 emissions with city fuel consumption.

IBM® SPSS® Statistics software (SPSS) screenshots Republished Courtesy of International Business Machines Corporation, © International Business Machines Corporation. SPSS Inc. was acquired by IBM in October, 2009. IBM, the IBM logo, ibm.com, and SPSS are trademarks or registered trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “IBM Copyright and trademark information” at http://www.ibm.com/legal/copytrade.shtml.