Skip to main content icon/video/no-internet

Pearson Correlation Coefficient

Karl Pearson (1857–1936) is credited with establishing the discipline of mathematical statistics. Building on earlier work by Francis Galton (1822–1911), one of Pearson’s major contributions to the field was the development of the Pearson product-moment correlation coefficient (or Pearson correlation, for short), which is often denoted by r. The correlation is one of the most common and useful statistics. The Pearson correlation, a measure of the relationship often between two continuous variables, is utilized throughout quantitative research in education and the social sciences. This entry is devoted to describing what the Pearson correlation is; the steps used to calculate it; the interpretation of its size, direction, and level of statistical significance; its data assumptions; and its limitations.

What Is the Pearson Correlation?

Put simply, the Pearson correlation is a measure of the linear relationship between two variables, X and Y, giving a value between +1.0 and −1.0, where 1.0 is a perfect positive correlation, 0.0 (zero) is no correlation, and −1.0 is a perfect negative correlation. Examples of the possible data distributions associated with five Pearson correlations are illustrated in Figure 1.

Figure 1 Example of five Pearson product-moment correlation coefficients

Figure

Importantly, where correlational estimates are concerned, there is no attempt to establish one of the variables as independent and the other as dependent. Therefore, relationships identified using correlation coefficients should be interpreted for what they are: associations, not causal relationships. To arrive at a Pearson correlation value (r) between two variables of interest, a number of calculations and logical steps are made. To illustrate these steps, a fictional example of two educational variables of interest is now provided.

Calculation of the Pearson Correlation Coefficient

Suppose you are the head of curriculum at a small English as a Second or Other Language institute in Auckland, New Zealand. A new cohort of intermediate-level English as a Second or Other Language students arrives every 10 weeks to participate in your program. The cohort flies to Auckland from various spots in the Asia-Pacific region: nearby in Polynesia, farther away in Micronesia, even farther in Southeast Asia, and at points beyond in East Asia. Being one of the teachers on the course, you notice a trend whereby, despite exhibiting equivalent levels of English fluency, the students originating from farther abroad tend to have more limited knowledge of New Zealand, its culture, and its customs, and often struggle with course material integrating such content. For the purpose of trying to better understand and tailor to the needs of particular student groups, you would like to explore the statistical relationship between (a) the distance that students travel to get to New Zealand and (b) their general knowledge of New Zealand.

To illustrate the steps taken to calculate a Pearson correlation, a fictional educational data set that includes a sample of one intake, namely 10 (N = 10) international students (Table 1, ID column) will be used. The time that it takes each student to fly directly to Auckland, New Zealand, the location of the course, can be used as a proxy measure of each student’s distance of travel to New Zealand. The flight times are presented in Table 1, flight time column (X). On the first day of the course, the students sit a 10-item general knowledge test about New Zealand. The results of the test, out of 10, are also presented in Table 1, test score column (Y).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading