How-to Guide for IBM® SPSS® Statistics Software

Introduction

In this guide you will learn how to produce a two-way scatter plot of two continuous variables. You will find links to the example dataset and you are encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have already opened the data file in IBM® SPSS® Statistics software (SPSS).

Contents

- Two-Way Scatter Plot
- An Example in SPSS: Male and Female Smoking Rates in the U.S. States
- 2.1 The SPSS Procedure
- 2.2 Exploring the SPSS Output

- Your Turn

1 Two-Way Scatter Plot

A two-way scatter plot is a graphical method used to explore the relationship between two continuous variables. One of the variables defines the horizontal axis (often called the X-axis) of the plot and the other variable defines the vertical axis (often called the Y-axis) of the plot. Each observation in the dataset is plotted within the figure based on the values it has for both of the variables. In other words, the values of the two variables under consideration define the coordinates, or location, within the figure where each case will be plotted.

Researchers construct two-way scatter plots to explore the relationship between two variables, to identify observations that are outliers, and as a diagnostic tool for more complicated statistical methods such as multiple regression.

2 An Example in SPSS: Male and Female Smoking Rates in the U.S. States

This example explores the relationship between smoking rates of men and women across the 50 U.S. states plus Washington, DC measured in 2009. The two variables are:

- Smoking rate among men across the states, measured as a percentage (smokemale).
- Smoking rate among women across the states, measured as a percentage (smokefemale).

Both of these are continuous variables, making them appropriate for producing a two-way scatter plot.

2.1 The SPSS Procedure

You can produce a two-way scatter plot in SPSS in several ways, one of which we explore in detail here. Begin by selecting from the menu:

Graphs → Chart Builder

This opens a Chart Builder dialog box. Figure 1 shows what this looks like in SPSS.

(Note: You may get a message asking you to define the level of measurement for variables in your dataset before you can proceed to building the chart. If you get this message, typically you can select “OK” and then “Scan Data” and SPSS will execute this task for you automatically.)

Figure 1: Selecting Chart Builder from the Graphs menu in SPSS.

In the lower half of the Chart Builder dialog box, select “Scatter/Dot” from the lower left menu. Eight icons for scatter plots appear. The one on the top left is for a simple scatter plot. Drag and drop that icon up into the open window in the upper half of this dialog box where the instructions say “Drag a Gallery chart here…”.

When you do, an Element Properties dialog box will open. You can ignore that for this example.

Next, drag and drop the variable you want plotted on the X-axis into the text box labeled “X-Axis?” For this example, the variable is smokemale. Next, drag and drop the smokefemale variable into the text box labeled “Y-Axis?” Note that, if such labels are available, SPSS will display the variable labels (e.g. “Percent Males Smoked 2009” and “Percent Females Smoked 2009” in this case) and not the actual variable names. Figure 2 shows what this looks like in SPSS.

Figure 2: Building a two-way scatter plot in the Chart Builder dialog box in SPSS.

Once done, click OK to produce the plot.

2.2 Exploring the SPSS Output

Executing the procedure above using the variable named smokemale on the X-axis and the variable named smokefemale on the Y-axis produces the scatter plot shown in Figure 3.

Figure 3 shows a clear relationship between these two variables. Relatively higher smoking rates among are associated with relatively higher smoking rates among women as well. The plot in Figure 3 shows a positive relationship between these two variables that appears to be pretty linear. Two common statistical measures of correlation – the Pearson correlation coefficient and the Spearman rank-order correlation coefficient – can be computed to measure the association between two variables (see the SAGE Research Methods Datasets modules on these methods). The Pearson correlation coefficient assumes a linear relationship between the two variables in question while the Spearman rank-order correlation coefficient only assumes a monotonic relationship. Figure 3 suggests that using the Pearson correlation coefficient to measure the association between these two variables would be appropriate.

Figure 3: Two-way scatter plot of male and female smoking rates across the 50 U.S. states plus Washington, DC measured in 2009, U.S. Statistical Abstracts, 2012.

3 Your Turn

Download this sample dataset and see if you can replicate the results. Then repeat the process using the two variables named uninsuredall and uninsuredchild, which measure the percentage of all persons and the percentage of children, respectively, that do not have health insurance as measured in 2009.

IBM® SPSS® Statistics software (SPSS) screenshots Republished Courtesy of International Business Machines Corporation, © International Business Machines Corporation. SPSS Inc. was acquired by IBM in October, 2009. IBM, the IBM logo, ibm.com, and SPSS are trademarks or registered trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “IBM Copyright and trademark information” at http://www.ibm.com/legal/copytrade.shtml.