Two-Way Analysis of Variance

Jeffrey R. Harring; Tessa Johnson

doi:10.4135/9781506326139

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Two-Way Analysis of Variance

By: Jeffrey R. Harring & Tessa Johnson
In:The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation
Chapter DOI:https://doi.org/10.4135/9781506326139.n718
Subject:Education

Request Permissions

Show page numbers Hide page numbers

Two-way analysis of variance (ANOVA) is a statistical technique used to analyze data from a study in which a researcher wishes to examine both the separate and the combined effects of two categorical independent variables, called factors, on a continuous dependent (or outcome) variable. While the ideas of ANOVA as a statistical approach date back more than two centuries, it was not until the seminal work of R. A. Fisher in the 1920s on analyzing data from complex experiments that two-way ANOVA became a popular, reliable procedure used by practitioners and methodologists alike. This entry first describes the data analytic context for ANOVA and the logic behind its implementation. Two-way ANOVA is then introduced and several key analytic elements are discussed in the context of a real data example.

The Logic of ANOVA

In its simplest form, a one-way ANOVA assesses whether mean differences exist on a single outcome variable across levels of a single factor. Historically, ANOVA was utilized for analyzing experimental data where the independent or grouping variable was manipulated by the researcher. For example, a random sample of subjects desiring to lose weight may be randomly assigned to a dieting group, an exercise group, a dieting and exercise group, and a control group (for which there is no intervention). The mean weight loss computed for each group is compared to every other group to see which treatment was the most effective weight loss regimen. Although ANOVA was initially grounded using data obtained through experimentation, it is applicable to data stemming from quasi-experimental and observational studies as well, where some or all of the factors are not manipulated and groups are intact.

Interestingly, the means of the outcome variable across levels of the factor in ANOVA are not directly compared but rather the magnitudes of their differences are evaluated by partitioning, then comparing, different sources of variability in the outcome. The overall variation in scores on the outcome can be partitioned into two components—variation of individual values around their group means and variation of the group means around the overall mean. These two sources of variation are frequently referred to as variability in within groups and between groups, respectively. If the within-group variation is small compared to the between-group variation, this suggests that the population means are different. Mean differences of levels of a factor are formally tested using a test of significance based on the F distribution, which tests the null hypothesis (H0) that the means of the J groups are equal:

$H_{0} : μ_{1} = μ_{2} = \dots = μ_{J} .$

More formally, the F test is used to compare the equality of two variances—the variance of scores within groups and the variance of means between groups. These variance estimates, called mean squares, are computed as the sum of squares divided by their respective degrees of freedom:

$\begin{matrix} M S_{between} = \frac{S S_{between}}{d f_{between}} = \frac{{\sum_{j = 1}^{J} n_{j} (\bar{Y}}_{. j} - {\bar{Y}}_{. .})^{2}}{J - 1}, \\ M S_{within} = \frac{S S_{within}}{d f_{within}} = \frac{\sum_{j = 1}^{J} \sum_{i = 1}^{n_{j}} {(Y_{i j} - {\bar{Y}}_{. j})}^{2}}{N - J} . \end{matrix}$

The F test statistic is calculated as the ratio of these mean squares or variances.

$F = \frac{M S_{between}}{M S_{within}} .$

[Page 1734] $M S_{within}$ is an estimate of the population variance, $σ^{2}$ , based upon the deviation of scores about the group means. It is not influenced by mean differences among the groups. $M S_{between}$ is also an estimate of the population variance if the null hypothesis is true. It is based upon the deviations of group means about the grand mean. Because its value is impacted by any group mean differences that exist in the population, it is only an estimate of the same population variance if those group effects are assumed to be zero, that is, if the null hypothesis is true. Under the null hypothesis, these two mean squares are thought to be estimating the same population value, and thus, their ratio should be approximately 1. If there were true group mean differences, $M S_{between}$ would be sensitive to them, but $M S_{within}$ would not. Therefore, a large computed F test statistic suggests that group mean differences, in fact, do exist in the population and the null hypothesis should be rejected.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Two-Way Analysis of Variance

The Logic of ANOVA

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends