Analysis of Variance

Edward L. Boone

doi:10.4135/9781506326139

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Analysis of Variance

By: Edward L. Boone
In:The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation
Chapter DOI:https://doi.org/10.4135/9781506326139.n41
Subject:Education

Request Permissions

Show page numbers Hide page numbers

Often researchers are confronted with determining whether the means of two or more groups differ. The analysis of variance (ANOVA) technique is a parametric hypothesis test to answer this question. ANOVA seeks to partition the overall data into components that correspond to variance explained by the groupings and variance that is unexplained by the groupings. Often the groups are defined by which treatment has been given to each of the experimental units in the group. In cases where the experimental units are randomly assigned to the treatment groups, ANOVA can be used to show causation. This entry discusses the basic principles of ANOVA and its organization, extensions, its use in contrasts and post hoc tests, and its assumptions.

Basic Principles

The simplest case of ANOVA is the one-way ANOVA where the groups are varied across only one factor and each group has the same sample size. Suppose there are k groups and within each group there are n samples taken from each group for a total sample size of nk. For notation, let yij be the measurement of outcome of interest from the jth sample in the ith group. We let µi denote the population mean of group i. In this notation, the ANOVA null hypothesis is:

$H_{0} : μ_{1} = μ_{2} = L = μ_{k} .$

This hypothesis corresponds to the state where all of the means µi are equal to each other and hence do not differ. The alternative hypothesis in this case is:

$H_{A} : at least two μ_{i} differ .$

If the ANOVA test rejects H0 in favor of HA, this means there is enough evidence to conclude that the group means are truly different.

To accomplish this, ANOVA partitions the overall variance. The overall variance is simply the sample standard deviation squared of all of the data regardless of treatment group. In our notation, we would have:

$S^{2} = \frac{\sum_{i = 1}^{k} \sum_{j = 1}^{n} {(y_{i j} - {\bar{y}}_{• •})}^{2}}{n k - 1},$

where ${\bar{y}}_{• •} = \frac{1}{n k} \sum_{i = 1}^{k} \sum_{j = 1}^{n} y_{i j}$ is the overall mean. In our notation, ${\bar{y}}_{i •} = \frac{1}{n} \sum_{j = 1}^{n} y_{i j}$ is the sample mean for the ith group. Here, the denominator is not useful in partitioning the groups and will be discarded to create the sum of squares total, denoted by SSTO and is given by:

$S S_{T O} = \sum_{i = 1}^{k} \sum_{j = 1}^{n} {(y_{i j} - {\bar{y}}_{• •})}^{2}$

By simply adding and subtracting the group sample mean in the SSTO and doing some algebra (some algebra details have been omitted), one can obtain:

$\begin{array}{l} S S_{T O} = \sum_{i = 1}^{k} \sum_{j = 1}^{n} {(y_{i j} - {\bar{y}}_{i •} + {\bar{y}}_{i •} - {\bar{y}}_{• •})}^{2} \\ = \sum_{i = 1}^{k} \sum_{j = 1}^{n} [{(y_{i j} - {\bar{y}}_{i •})}^{2} + 2 (y_{i j} - {\bar{y}}_{i •}) ({\bar{y}}_{i •} - {\bar{y}}_{• •}) \\ + {({\bar{y}}_{i •} - {\bar{y}}_{• •})}^{2}] \\ = \dots \\ = \underset{Error}{\underset{︸}{\sum_{i = 1}^{k} \sum_{j = 1}^{n} {(y_{i j} - {\bar{y}}_{i •})}^{2}}} + \underset{Treatment}{\underset{︸}{\sum_{i = 1}^{k} n {({\bar{y}}_{i •} - {\bar{y}}_{• •})}^{2}}} \\ = S S_{E} + S S_{T} . \end{array}$

Notice by doing this, the SSTO can be expressed as the sum of a term associated with error and a term associated with the treatment group. This is the essence of ANOVA, partitioning the SSTO into meaningful components. Furthermore, each of the components is itself a sum of items that are squared; hence, the names sum of squares error, SSE, and sum of squares treatment, SST are often given to the components. Note that in this one-way ANOVA scenario, SST is often called the sum of squares between and the SSE is often called the sum of squares within and are denoted by SSB and SSW, respectively.

Similarly, for the one-way ANOVA case with equal sample sizes, the total degrees of freedom, dfTO = nk − 1, associated with the SSTO can also be partitioned into degrees of freedom error, dfE = n(k – 1), and degrees of freedom for treatment, [Page 87]dfT = k – 1. As with the sum of squares, the degrees of freedom also add together nicely dfTO = dfE + dfT.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Analysis of Variance

Basic Principles

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends