Biserial Correlation Coefficients

Neil J.Salkind

doi:10.4135/9781412952644

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Biserial Correlation Coefficients

Edited by:
Neil J. Salkind
In:Encyclopedia of Measurement and Statistics
Chapter DOI:https://doi.org/10.4135/9781412952644.n57
Subject:Anthropology, Business and Management, Criminology and Criminal Justice, Communication and Media Studies, Counseling and Psychotherapy, Economics, Education, Geography, Health, History, Marketing, Nursing, Political Science and International Relations, Psychology, Social Policy and Public Policy, Social Work, Sociology, Science, Technology, Computer Science, Engineering, Mathematics, Medicine

Request Permissions

Show page numbers Hide page numbers

Biserial correlation coefficients are measures of association that apply when one of the observed variables takes on two numerical values (a binary variable) and the other variable is a measurement or a score. There are several biserial coefficients, with the appropriate choice depending on the underlying statistical model for the data. The point biserial correlation and Pearson's biserial correlation are arguably the most well known and most commonly used coefficients in practice. We will focus on these two coefficients but will discuss other approaches.

Karl Pearson developed the sample biserial correlation coefficient in the early 1900s to estimate the correlation ρYZ between two measurements Z and Y when Z is not directly observed. Instead of Z, data are collected on a binary variable X with X = 0 if Z falls below a threshold level and X = 1 otherwise. The numerical values assigned to X do not matter provided the smaller value identifies when Z is below the threshold. In many settings, the latent variable Z is a conceptual construct and not measurable. The sample point biserial correlation estimates the correlation ρYX between Y and a binary variable X without reference to an underlying latent variable Z.

We will use S. Karelitz and colleagues' data on 38 infants to illustrate these ideas. A listing of the data is given in Table 1. The response Y is a child's IQ score at age 3, whereas X = 1 if the child's speech developmental level at age 3 is high, and X = 0 otherwise. The (population) biserial correlation ρYZ is a reasonable measure of association when X is a surrogate for a latent continuum Z of speech levels. The (population) point biserial correlation ρYX is more relevant when the relationship between IQ and the underlying Z scale is not of interest, or the latent scale could not be justified.

The Point Biserial Correlation

Assume that a random sample (y1, x1), (y2, x2),…, (yn, xn) of n observations is selected from the (Y, X) population, where Y is continuous and X is binary. Let sYX be the sample covariance between all yi and all xi, and let s2y and s2x be the sample variances of all yi and all xi, respectively. The population correlation ρYX between Y and X is estimated by the sample point biserial correlation coefficient, which is just the [Page 95]product-moment correlation between the Y and X samples:

Table 1 Data for a Sample of 38 Children
X = 0	Y:	87	90	94	94	97	103	103	104	106	108	109
		109	109	112	119	132
X = 1	Y:	100	103	103	106	112	113	114	114	118	119	120
		120	124	133	135	135	136	141	155	157	159	162
Note: X = speech developmental level (0 = low; 1 = high), and Y = IQ score.

The sample point biserial estimator rYX can also be expressed as

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Biserial Correlation Coefficients

The Point Biserial Correlation

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends