Skip to main content icon/video/no-internet

Correlation

If one wants to know the degree of a relationship, the correlation between two variables can be examined. Correlations can be quantified by computing a correlation coefficient. This entry first describes a concept central to correlation, covariance, and then discusses calculation and interpretation of correlation coefficients.

Covariance indicates the tendency in the linear relationship for two random variables to covary (or vary together) that is represented in deviations measured in the unstandardized units in which X and Y are measured. Specifically, it is defined as the expected product of the deviations of each of two random variables from its expected values or means.

The population covariance between two variables, X and Y, can be written by:

cov(X,Y)=E[(XE(X))(YE(Y))]=E[XYXE(Y)E(X)YE(X)E(Y)]=E(XY)E(X)E(Y)E(X)E(Y)E(X)E(Y)=E(XY)E(X)E(Y)

where E is the expected value or population mean.

Similarly, the sample covariance between x and y is given by:

s(x,y)=1N1i=1n(x)ix¯)(yiy¯)=1N1[i=1nxiyi(Ni=1xi)(Ni=1yi)N]

where N is the number of observations; x¯,y¯ are the sample means of x and y.

When one interprets covariances, zero covariances indicate that variables are not linearly related. If they are nonlinearly associated or statistically independent, the covariance is zero. On the other hand, a nonzero covariance indicates the tendency of covarying. If the sign of covariance is positive, the two variables tend to vary in the same direction. If a covariance value is negative, the two variables tend to move in the opposite direction. The covariance is not independent of the unit used to measure x and y, and so magnitude of the covariance depends on the measurement units of two variables. Note that the nonzero covariance does not indicate causation and how strong the association is between two variables.

When considering a variance–covariance matrix, covariances or correlations among observed variables depend on the relationship between latent variables and linear composite variables (i.e., tests consisting of more than one item). The covariance between two composite variables is the sum of the elements of the covariance matrix. It can be written by:

σ(X,Y)=σ(X,Y)=i=1pj=1qσ(Xi,Yj)

where p, q are the numbers of variable in X and Y, respectively.

This most commonly computed correlation coefficient is a standardized index of linear association. From the covariance, the correlation coefficient for X and Y is calculated using the following equation:

r=i=1NxiyiN1s(x,y)=i=1Nxiyi(sXsY)

The product moment correlation coefficient was originally invented by Karl Pearson in 1895 based on the studies conducted by Francis Galton and J. D. Hamilton Dickson in the 1880s. The correlation coefficient ranges from −1 to 1. If its value is 0, the variables have no linear relationship; and if its value is −1 or 1, each variable is perfectly predicted by the other. Its sign indicates the direction of the relationship.

Hyun Joo Jung Jennifer Randall
10.4135/9781506326139.n159

Further Readings

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah, NJ: Erlbaum.
Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society of London, 45, 135145.
Galton, F. (1889). Natural inheritance. London, UK: Macmillan.
Galton, F., & Dickson, J. H. (1886). Family likeness in

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading