Skip to main content icon/video/no-internet

Biserial correlation coefficients are measures of association that apply when one of the observed variables takes on two numerical values (a binary variable) and the other variable is a measurement or a score. There are several biserial coefficients, with the appropriate choice depending on the underlying statistical model for the data. The point biserial correlation and Pearson's biserial correlation are arguably the most well known and most commonly used coefficients in practice. We will focus on these two coefficients but will discuss other approaches.

Karl Pearson developed the sample biserial correlation coefficient in the early 1900s to estimate the correlation ρYZ between two measurements Z and Y when Z is not directly observed. Instead of Z, data are collected on a binary variable X with X = 0 if Z falls below a threshold level and X = 1 otherwise. The numerical values assigned to X do not matter provided the smaller value identifies when Z is below the threshold. In many settings, the latent variable Z is a conceptual construct and not measurable. The sample point biserial correlation estimates the correlation ρYX between Y and a binary variable X without reference to an underlying latent variable Z.

We will use S. Karelitz and colleagues' data on 38 infants to illustrate these ideas. A listing of the data is given in Table 1. The response Y is a child's IQ score at age 3, whereas X = 1 if the child's speech developmental level at age 3 is high, and X = 0 otherwise. The (population) biserial correlation ρYZ is a reasonable measure of association when X is a surrogate for a latent continuum Z of speech levels. The (population) point biserial correlation ρYX is more relevant when the relationship between IQ and the underlying Z scale is not of interest, or the latent scale could not be justified.

The Point Biserial Correlation

Assume that a random sample (y1, x1), (y2, x2),…, (yn, xn) of n observations is selected from the (Y, X) population, where Y is continuous and X is binary. Let sYX be the sample covariance between all yi and all xi, and let s2y and s2x be the sample variances of all yi and all xi, respectively. The population correlation ρYX between Y and X is estimated by the sample point biserial correlation coefficient, which is just the product-moment correlation between the Y and X samples:

Table 1 Data for a Sample of 38 Children
X = 0 Y: 87 90 94 94 97 103 103 104 106 108 109
109 109 112 119 132
X = 1 Y: 100 103 103 106 112 113 114 114 118 119 120
120 124 133 135 135 136 141 155 157 159 162
Note: X = speech developmental level (0 = low; 1 = high), and Y = IQ score.
None

The sample point biserial estimator rYX can also be expressed as

None

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading