Skip to main content icon/video/no-internet

Errors of Measurement: Regression Toward the Mean

In 1886, Francis Galton published an article titled “Regression Towards Mediocrity in Hereditary Stature.” Interested in heredity, Galton had obtained measurements on heights of 205 sets of parents and their 913 adult children. He noticed that if he selected families where the parents were tall, the average height of the children was less than that of their parents, whereas if he selected families where the parents were short, the average height of the children was greater. Galton called this “regression towards mediocrity”; it is now known as “regression towards the mean,” as the term mediocrity has acquired disparaging connotations.

The same thing happens with the children: for tall children, the mean height of their parents is less; for short children, the mean height of their parents is greater. This is a statistical, not a genetic, phenomenon. This entry discusses how regression toward the mean works, providing several examples.

How Regression Toward the Mean Works

Galton’s data were quite complicated, with adjustment for gender and multiple children per family. In this entry, a much simpler data set is presented to see how regression works: pulse rate for 185 students, each student measured by two other students. The data are shown in Figure 1. This figure also shows lines through the means of the first and second measurement and the line of equality, on which the points would lie if the two measurements were identical. The horizontal and vertical lines cross very close to the line of equality, because the means of the first and second measurements are almost the same, 72.6 and 73.3 beats per minute (b/min), respectively. The spread of the distributions is almost the same, too. The minima are 45 and 46 b/min, the maxima are both 108 b/min, and the standard deviations are 10.4 and 9.8 b/min.

Because the two pulse measurements were conducted during the same practical class, they should be the same, except for measurement error. What is the mean second pulse measurement for students whose first pulse is 60 b/min? Will it be 60 b/min? Not many first measurements are exactly 60, so all measurements between 55 and 65 b/min are considered. As Figure 2 shows, the mean second pulse is greater than 60 b/min; it is 66.2 b/min, closer to the mean than is 60 b/min.

This can also be done for the first pulse, as shown in Figure 3. These means do not lie on the line of equality but on one which crosses it, as shown in Figure 4.

Figure 1 Scatterplot of Pairs of Pulse Measurements by Two Different Observers on 185 Students

None

The means in Figure 3 lie on the simple linear regression line, approximately. When statisticians estimate the line that best fits the data in a scatterplot diagram like Figure 1, they find the line that best predicts the mean value of one of the variables, called the outcome, dependent, or y variable, from the observed value of the other, called the predictor, explanatory, independent, or x variable. The line chosen is the one that makes a minimum of the differences between the observed values of the y variable and the mean values that would be predicted by the line. It minimizes the sum of the squares differences between the observed and predicted values. The method has its roots in Galton’s article, hence the name regression line. The line shown in Figure 4 is called the regression of second pulse on first pulse.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading