Skip to main content icon/video/no-internet

Two-Way Analysis of Variance

Two-way analysis of variance (ANOVA) is a statistical technique used to analyze data from a study in which a researcher wishes to examine both the separate and the combined effects of two categorical independent variables, called factors, on a continuous dependent (or outcome) variable. While the ideas of ANOVA as a statistical approach date back more than two centuries, it was not until the seminal work of R. A. Fisher in the 1920s on analyzing data from complex experiments that two-way ANOVA became a popular, reliable procedure used by practitioners and methodologists alike. This entry first describes the data analytic context for ANOVA and the logic behind its implementation. Two-way ANOVA is then introduced and several key analytic elements are discussed in the context of a real data example.

The Logic of ANOVA

In its simplest form, a one-way ANOVA assesses whether mean differences exist on a single outcome variable across levels of a single factor. Historically, ANOVA was utilized for analyzing experimental data where the independent or grouping variable was manipulated by the researcher. For example, a random sample of subjects desiring to lose weight may be randomly assigned to a dieting group, an exercise group, a dieting and exercise group, and a control group (for which there is no intervention). The mean weight loss computed for each group is compared to every other group to see which treatment was the most effective weight loss regimen. Although ANOVA was initially grounded using data obtained through experimentation, it is applicable to data stemming from quasi-experimental and observational studies as well, where some or all of the factors are not manipulated and groups are intact.

Interestingly, the means of the outcome variable across levels of the factor in ANOVA are not directly compared but rather the magnitudes of their differences are evaluated by partitioning, then comparing, different sources of variability in the outcome. The overall variation in scores on the outcome can be partitioned into two components—variation of individual values around their group means and variation of the group means around the overall mean. These two sources of variation are frequently referred to as variability in within groups and between groups, respectively. If the within-group variation is small compared to the between-group variation, this suggests that the population means are different. Mean differences of levels of a factor are formally tested using a test of significance based on the F distribution, which tests the null hypothesis (H0) that the means of the J groups are equal:

H0:μ1=μ2==μJ.

More formally, the F test is used to compare the equality of two variances—the variance of scores within groups and the variance of means between groups. These variance estimates, called mean squares, are computed as the sum of squares divided by their respective degrees of freedom:

MSbetween=SSbetweendfbetween=j=1Jnj(Y¯.jY¯..)2J1,MSwithin=SSwithindfwithin=j=1Ji=1nj(YijY¯.j)2NJ.

The F test statistic is calculated as the ratio of these mean squares or variances.

F=MSbetweenMSwithin.

MSwithin is an estimate of the population variance, σ2, based upon the deviation of scores about the group means. It is not influenced by mean differences among the groups. MSbetween is also an estimate of the population variance if the null hypothesis is true. It is based upon the deviations of group means about the grand mean. Because its value is impacted by any group mean differences that exist in the population, it is only an estimate of the same population variance if those group effects are assumed to be zero, that is, if the null hypothesis is true. Under the null hypothesis, these two mean squares are thought to be estimating the same population value, and thus, their ratio should be approximately 1. If there were true group mean differences, MSbetween would be sensitive to them, but MSwithin would not. Therefore, a large computed F test statistic suggests that group mean differences, in fact, do exist in the population and the null hypothesis should be rejected.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading