Skip to main content icon/video/no-internet

Path analysis is a statistical procedure for testing the causal relationship between observed variables. In a path analysis model, this cause–effect relationship is not discovered via data analysis but instead is formulated based on the researcher’s knowledge or on previous studies. Path analysis was initially developed by Sewall Wright in 1921 for examining the direct and indirect effects of variables on other variables; a century later, it continues to be a popular statistical procedure. Since the rapid development starting in the 1970s of a more comprehensive family of statistical techniques known as structural equation modeling (SEM), path analysis has been viewed as a special type of SEM in which only observed variables are involved in the analysis.

Example of a Path Analysis Model

As an example, a researcher formulates the following hypotheses: X1 and X2 are the common causes of Y1 and Y2, and both Y1 and Y2 are the causes of Y3. This path analysis model can be represented in a path diagram (Figure 1).

Figure 1 An Example of a Path Analysis Model

Figure

The corresponding regression equations are

Y1=τ1+β1X1+β2X2+D1,Y2=τ2+β3X1+β4X2+D2,Y3=τ3+β5X1+β6X2+D3,

where β1 to β6 denote the path coefficients from a predictor to an outcome variable, and D1 to D3 are the residuals for the corresponding outcome variable. An intercept term τ is included in each equation. Therefore, this set of equations corresponds to unstandardized regression models. Although it is possible, the intercept terms are typically not shown in the path diagram when the purpose of the analysis is the cause–effect relationship between variables.

Key Components

In a path analysis model, an observed variable is presented within a square or rectangle. An observed variable is either exogenous (X1 and X2) or endogenous (Y1, Y2, and Y3), which corresponds to a predictor or outcome variable, respectively, in a regression model. The cause of an exogenous variable is not included in the model, whereas the cause of an endogenous variable is explicitly specified. The causal relationship is indicated by a single-headed arrow (e.g., X1Y1 in Figure 1), with the variable at the tail of the arrow being the cause and the variable at the head of the arrow being the effect. The direct effect from one variable to another is quantified by the path coefficient, similar to the slope in regression analysis. In a path analysis model, exogenous variables may also affect endogenous variables through some intermediate variables (X1Y1Y3). The intermediate variables are sometimes called mediators, and the mediating effect is called the indirect effect. The double-headed curved arrow at the top of an exogenous variable indicates the variance of the exogenous variable (φ11, φ22), and the double-headed curved arrow connecting two exogenous variables is the covariance between the two variables (φ12). Each endogenous variable has an unobserved disturbance (D1, D2, and D3), represented within a circle or oval. A disturbance contains two confounded components: the measurement error of the endogenous variable and all the causes of the endogenous variable that are not explicitly specified in the model. If it is known that a pair of endogenous variables share some common missing causes, their disturbances can be correlated. The path coefficient from a disturbance to its endogenous variable is typically fixed at 1 to assign a metric to the disturbance.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading