Skip to main content icon/video/no-internet

Generally speaking, the notion of correlation is very close to that of association. It is the statistical association between two variables of interval or ratio measurement level.

To begin with, correlation should not be confused with causal effect. Indeed, statistical research into causal effect for only two variables happens to be impossible, at least in observational research. Even in extreme cases of so-called unicausal effects, such as contamination with the Treponema pallidum bacteria and contracting syphilis, there are always other variables that come into play, contributing, modifying, or counteracting the effect. The presence of a causal effect can only be sorted out in a multivariate context and is consequently more complex than correlation analysis.

Let us limit ourselves to two interval variables, denoted as X and Y, and let us leave causality aside. We assume for the moment a linear model. It is important to note that there is a difference between the correlation coefficient, often indicated as r, and the unstandardized regression coefficient b (computer output: B). The latter merely indicates the slope of the regression line and is computed as the tangent of the angle formed by the regression line and the x-axis. With income (X) and consumption (Y) as variables, we now have the consumption quote, which is the additional amount one spends after obtaining one unit of extra income, so the change in Y per additional unit in X is B = δYX. We will see below that there are, in fact, two such regression coefficients and that the correlation coefficient is the geometrical mean of them.

Five Main Features of a Correlation

Starting from probabilistic correlations, each correlation has five main features:

  • nature,
  • direction,
  • sign,
  • strength,
  • statistical generalization capacity.

1. Nature of the Correlation

The nature of the correlation is linear for the simple correlation computation suggested above. This means that through the scatterplot, a linear function of the form E(Y) = b0 + by1X1 is estimated. Behind the correlation coefficient of, for example, r = 0.40 is a linear model. Many researchers are fixated on the number between 0 and 1 or between 0 and −1, and they tend to forget this background model. Often, they unconsciously use the linear model as a tacit obviousness. They do not seem to realize that nonlinearity occurs frequently.

An example of nonlinearity is the correlation between the percentage of Catholics and the percentage of CDU/CSU voters in former West Germany (Christlich-Demokratische Union/Christlich-Soziale Union). One might expect a linear correlation: the more Catholics, the more voters of CDU/CSU according to a fixed pattern. However, this “the more, the more” pattern only seems to be valid for communes with many Catholics. For communes with few Catholics, the correlation turns out to be fairly negative: The more Catholics, the fewer voters for CDU/CSU. Consequently, the overall scatterplot displays a U pattern. At first, it drops, and from a certain percentage of Catholics onwards, it starts to rise. The quadratic function that describes a parabola therefore shows a better fit with the scatterplot than the linear function.

...

locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading