Entry
Reader's guide
Entries A-Z
Correlation
Generally speaking, the notion of correlation is very close to that of association. It is the statistical association between two variables of interval or ratio measurement level.
To begin with, correlation should not be confused with causal effect. Indeed, statistical research into causal effect for only two variables happens to be impossible, at least in observational research. Even in extreme cases of so-called unicausal effects, such as contamination with the Treponema pallidum bacteria and contracting syphilis, there are always other variables that come into play, contributing, modifying, or counteracting the effect. The presence of a causal effect can only be sorted out in a multivariate context and is consequently more complex than correlation analysis.
Let us limit ourselves to two interval variables, denoted as X and Y, and let us leave causality aside. We assume for the moment a linear model. It is important to note that there is a difference between the correlation coefficient, often indicated as r, and the unstandardized regression coefficient b (computer output: B). The latter merely indicates the slope of the regression line and is computed as the tangent of the angle formed by the regression line and the x-axis. With income (X) and consumption (Y) as variables, we now have the consumption quote, which is the additional amount one spends after obtaining one unit of extra income, so the change in Y per additional unit in X is B = δY/δX. We will see below that there are, in fact, two such regression coefficients and that the correlation coefficient is the geometrical mean of them.
Five Main Features of a Correlation
Starting from probabilistic correlations, each correlation has five main features:
- nature,
- direction,
- sign,
- strength,
- statistical generalization capacity.
1. Nature of the Correlation
The nature of the correlation is linear for the simple correlation computation suggested above. This means that through the scatterplot, a linear function of the form E(Y) = b0 + by1X1 is estimated. Behind the correlation coefficient of, for example, r = 0.40 is a linear model. Many researchers are fixated on the number between 0 and 1 or between 0 and −1, and they tend to forget this background model. Often, they unconsciously use the linear model as a tacit obviousness. They do not seem to realize that nonlinearity occurs frequently.
An example of nonlinearity is the correlation between the percentage of Catholics and the percentage of CDU/CSU voters in former West Germany (Christlich-Demokratische Union/Christlich-Soziale Union). One might expect a linear correlation: the more Catholics, the more voters of CDU/CSU according to a fixed pattern. However, this “the more, the more” pattern only seems to be valid for communes with many Catholics. For communes with few Catholics, the correlation turns out to be fairly negative: The more Catholics, the fewer voters for CDU/CSU. Consequently, the overall scatterplot displays a U pattern. At first, it drops, and from a certain percentage of Catholics onwards, it starts to rise. The quadratic function that describes a parabola therefore shows a better fit with the scatterplot than the linear function.
...
- Analysis of Variance
- Association and Correlation
- Association
- Association Model
- Asymmetric Measures
- Biserial Correlation
- Canonical Correlation Analysis
- Correlation
- Correspondence Analysis
- Intraclass Correlation
- Multiple Correlation
- Part Correlation
- Partial Correlation
- Pearson's Correlation Coefficient
- Semipartial Correlation
- Simple Correlation (Regression)
- Spearman Correlation Coefficient
- Strength of Association
- Symmetric Measures
- Basic Qualitative Research
- Basic Statistics
- F Ratio
- N(n)
- t-Test
- X¯
- Y Variable
- z-Test
- Alternative Hypothesis
- Average
- Bar Graph
- Bell-Shaped Curve
- Bimodal
- Case
- Causal Modeling
- Cell
- Covariance
- Cumulative Frequency Polygon
- Data
- Dependent Variable
- Dispersion
- Exploratory Data Analysis
- Frequency Distribution
- Histogram
- Hypothesis
- Independent Variable
- Measures of Central Tendency
- Median
- Null Hypothesis
- Pie Chart
- Regression
- Standard Deviation
- Statistic
- Causal Modeling
- DISCOURSE/CONVERSATION ANALYSIS
- Econometrics
- Epistemology
- Ethnography
- Evaluation
- Event History Analysis
- Experimental Design
- Factor Analysis and Related Techniques
- Feminist Methodology
- Generalized Linear Models
- HISTORICAL/COMPARATIVE
- Interviewing in Qualitative Research
- Latent Variable Model
- LIFE HISTORY/BIOGRAPHY
- LOG-LINEAR MODELS (CATEGORICAL DEPENDENT VARIABLES)
- Longitudinal Analysis
- Mathematics and Formal Models
- Measurement Level
- Measurement Testing and Classification
- Multilevel Analysis
- Multiple Regression
- Qualitative Data Analysis
- Sampling in Qualitative Research
- Sampling in Surveys
- Scaling
- Significance Testing
- Simple Regression
- Survey Design
- Time Series
- ARIMA
- Box-Jenkins Modeling
- Cointegration
- Detrending
- Durbin-Watson Statistic
- Error Correction Models
- Forecasting
- Granger Causality
- Interrupted Time-Series Design
- Intervention Analysis
- Lag Structure
- Moving Average
- Periodicity
- Serial Correlation
- Spectral Analysis
- Time-Series Cross-Section (TSCS) Models
- Time-Series Data (Analysis/Design)
- Trend Analysis
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches