How-to Guide for IBM® SPSS® Statistics Software
Introduction

In this guide you will learn how to estimate a simple regression model in IBM® SPSS® Statistics software (SPSS) using a practical example to illustrate the process. You will find links to the example dataset and you are encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have already opened the data file in SPSS.

Contents
• Simple Regression
• An Example in SPSS: Infant Mortality and Poverty Across the 50 U.S. States
• 2.1 The SPSS Procedure
• 2.2 Exploring the SPSS Output
1 Simple Regression

Simple regression expresses a dependent, or response, variable as a linear function of an independent variable. This requires estimating an intercept (often called a constant) and a slope that describes the expected value of the dependent variable at any particular value of the independent variable. Most attention is typically focused on the slope estimate because it captures the relationship between the dependent and independent variables. The dependent variable should be continuous. This example will focus on using an independent variable that is also continuous, though the model can also accommodate a categorical independent variable (see Regression with Dummy Variables).

2 An Example in SPSS: Infant Mortality and Poverty Across the 50 U.S. States

This example uses two variables from the 2012 U.S. Statistical Abstracts, both of which were actually measured in 2007:

• A state’s infant mortality rate in 2007 (infantmort), measured as number of deaths per 1000 births.
• A state’s poverty rate in 2007 (poverty), measured as the percentage of persons living in poverty.

The poverty rate variable has a mean of 12.67, a standard deviation of 3.12, a minimum value of 7.1 and a maximum value of 20.6. The infant mortality variable has a mean of 7.07, a standard deviation of 1.50, a minimum value of 4.8 and a maximum value of 13.1. Both variables are continuous measures, making them appropriate for simple regression.

2.1 The SPSS Procedure

When conducting a simple regression, it is often wise to examine each variable in isolation first. We start by presenting histograms of infant mortality and poverty rates across the states. This is done in SPSS by selecting from the menu:

Analyze → Descriptive Statistics → Explore

In the Explore dialog box that opens, move the infant mortality and poverty rate variables into the Dependent List: box. On the right-hand side of the Explore dialog box, click the “Plots” button. This opens another dialog box where you can select the plots you want to produce. For this example, just check “Histogram” under the Descriptive heading. Then click Continue and OK to perform the analysis.

Screenshots for the procedure to produce histograms in SPSS are available in the How-to Guides for the Dispersion of a Continuous Variable topic.

To estimate a simple regression model in SPSS, select from the menu:

Analyze → Regression → Linear

In the Linear Regression dialog box that opens, move the infant mortality variable (infantmort) into the Dependent: window and move the poverty rate variable (poverty) into the Independent(s): window.

Figure 1 shows what this looks like in SPSS.

Figure 1: Selecting simple regression from the Analyze menu in SPSS.

Once you are done, click OK to perform the analysis.

2.2 Exploring the SPSS Output

Figures 2 and 3 present the histograms for infant mortality and poverty rates across the 50 U.S. states plus Washington, DC, respectively.

Figure 2: Histogram showing the distribution of infant mortality measured in 2007 across the 50 U.S. states plus Washington, DC, U.S. Stat stracts, 2012.

Figure 2 shows that the majority of values for infant mortality clustered near the mean of about 7. However, the distribution does have a slight positive skew, meaning there are a few more extreme values above the mean that below it. Further diagnostics might be warranted to make sure those observations are not unduly influencing the results of the regression.

Figure 3: Histogram showing the distribution of the poverty rate measured in 2007 across the 50 U.S. states plus Washington, DC, U.S. Statistical Abstracts, 2012.

Figure 3 shows that the values for state poverty rates cluster somewhat near its mean of about 12.7. There is a slight positive skew, but there are also a fair number of observations below the mean as well. Further diagnostics might be warranted to make sure the few observations with larger values are not unduly influencing the results of the regression.

Figure 4 presents four tables of results that are produced by the simple linear regression procedure in SPSS. The fourth table, outlined in red, includes the results of primary interest.

Figure 4: Simple regression of infant mortality rate measured in 2007 regressed on poverty rate measured in 2007 for the 50 U.S. states plus Washington, DC, U.S. Statistical Abstracts, 2012.

The first three tables in Figure 4 report the independent variable(s) entered into the model, some summary fit statistics for the regression model, and an analysis of variance for the model as a whole. While detailed examination of these tables is beyond the scope of this example, we note in the second table that R-Square measures the proportion of the variance in the dependent variable explained by the model, which in this case consists of a single independent variable. An R-Square of 0.289 means that almost 29 percent of the variance in the infant mortality rate across the states is accounted for by the poverty rate.

The fourth table in Figure 4, outlined in red, presents the estimates of the intercept, or constant, and the slope coefficient. The results report an estimate of the intercept (or constant) as equal to approximately 3.78. The constant of a simple regression model can be interpreted as the average expected value of the dependent variable when the independent variable equals zero. In this case, our independent variable, poverty, can never be zero, so the constant by itself does not tell us much. Researchers do not often base predictions on the intercept, so it often receives little attention.

The fourth table 4 reports that the estimated value for the slope coefficient linking state poverty rate to state infant mortality rate is estimated to be approximately 0.26. This represents the average marginal effect of the poverty rate on the infant mortality rate, and can be interpreted as the expected change on average in the dependent variable for a one-unit increase in the independent variable. For this example, this means that every increase of one percentage point in a state’s population that is living in poverty is associated with an average increase in the number of infant deaths per 1000 births of 0.26. The table also reports that this estimate is statistically significantly different from zero. This leads us to reject the null hypothesis and conclude that there does appear to be a positive relationship between poverty rates and infant mortality across the U.S. states.

There are multiple diagnostic tests researchers might perform following the estimation of a simple regression model to evaluate whether the model appears to violate any of the OLS assumptions or whether there are other kinds of problem such as particularly influential cases. Describing all of these diagnostic tests is well beyond the scope of this example.