How-to Guide for IBM® SPSS® Statistics Software
Introduction

In this guide you will learn how to produce a two-way scatter plot of two continuous variables. You will find links to the example dataset and you are encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have already opened the data file in IBM® SPSS® Statistics software (SPSS).

Contents
• Two-Way Scatter Plot
• An Example in SPSS: 4th Grade Reading and Math Proficiency in the U.S. States
• 2.1 The SPSS Procedure
• 2.2 Exploring the SPSS Output
1 Two-Way Scatter Plot

A two-way scatter plot is a graphical method used to explore the relationship between two continuous variables. One of the variables defines the horizontal axis (often called the x-axis) of the plot and the other variable defines the vertical axis (often called the y-axis) of the plot. Each observation in the dataset is plotted within the figure based on the values it has for both of the variables. In other words, the values of the two variables under consideration define the coordinates, or location, within the figure where each case will be plotted.

Researchers construct two-way scatter plots to explore the relationship between two variables, to identify observations that are outliers, and as a diagnostic tool for more complicated statistical methods such as multiple regression.

2 An Example in SPSS: 4th Grade Reading and Math Proficiency in the U.S. States

This example uses two variables from the Consolidated State Performance Report from the U.S. Department of Education measured for the 2011–2012 school year across the 50 U.S. states plus the District of Columbia and Puerto Rico:

• The percentage of 4th grade students who are proficient in math across the states (g4math201112).

Both of these are continuous variables, making them appropriate for producing a two-way scatter plot.

2.1 The SPSS Procedure

You can produce a two-way scatter plot in SPSS in several ways, one of which we explore in detail here. Begin by selecting from the menu:

Graphs → Chart Builder

This opens a Chart Builder dialog box. Figure 1 shows what this looks like in SPSS.

(Note: you may get a message asking you to define the level of measurement for variables in your dataset before you can proceed to building the chart. If you get this message, typically you can select “OK” and then “Scan Data” and SPSS will execute this task for you automatically.)

Figure 1: Selecting Chart Builder from the Graphs menu in SPSS. In the lower half of the Chart Builder dialog box, select “Scatter/Dot” on the lower left menu. Eight icons for scatter plots appear. The one on the top left is for a simple scatter plot. Drag and drop that icon up into the open window in the upper half of this dialog box where the instructions say “Drag a Gallery chart here…”.

When you do, an Element Properties dialog box will open. You can ignore this dialog box for this example.

Next, drag and drop the variable you want plotted on the x-axis into the text box labeled “X-Axis? “ For this example, the variable is g4read201112. Next, drag and drop the g4math201112 variable into the text box labeled “Y-Axis? “ Note that SPSS will display the variable labels (e.g. “G4 Math 2011-12” and “G4 Reading 2011-12” in this case) and not the actual variable names if such labels are available. Figure 2 shows what this looks like in SPSS.

Figure 2: Building a two-way scatter plot in the Chart Builder dialog box in SPSS. Once done, click OK to produce the plot.

2.2 Exploring the SPSS Output

Executing the procedure above using the variable named g4read201112 on the x-axis and the variable named g4math201112 on the y-axis produces the scatter plot shown in Figure 3.

Figure 3: Two-way scatter plot of 4th grade reading and math proficiency rates for the 2011–2012 school year across the 50 U.S. states plus the District of Columbia and Puerto Rico, Consolidated State Performance Report from the U.S. Department of Education. Figure 3 shows a clear relationship between these two variables. Relatively higher percentages of 4th grade student math proficiency are associated with relatively higher percentages of 4th grade student reading proficiency. The plot in Figure 3 shows a positive relationship between these two variables that appears to be linear. Two common statistical measures of correlation – the Pearson correlation coefficient and the Spearman rank-order correlation coefficient – can be computed to measure the association between two continuous variables (see the SAGE Research Methods Datasets modules on these methods). The Pearson correlation coefficient assumes a linear relationship between the two variables in question while the Spearman rank-order correlation coefficient only assumes a monotonic relationship. Figure 3 suggests that using the Pearson correlation coefficient to measure the association between these two variables would be appropriate.