Skip to main content icon/video/no-internet

Analysis of Variance

Often researchers are confronted with determining whether the means of two or more groups differ. The analysis of variance (ANOVA) technique is a parametric hypothesis test to answer this question. ANOVA seeks to partition the overall data into components that correspond to variance explained by the groupings and variance that is unexplained by the groupings. Often the groups are defined by which treatment has been given to each of the experimental units in the group. In cases where the experimental units are randomly assigned to the treatment groups, ANOVA can be used to show causation. This entry discusses the basic principles of ANOVA and its organization, extensions, its use in contrasts and post hoc tests, and its assumptions.

Basic Principles

The simplest case of ANOVA is the one-way ANOVA where the groups are varied across only one factor and each group has the same sample size. Suppose there are k groups and within each group there are n samples taken from each group for a total sample size of nk. For notation, let yij be the measurement of outcome of interest from the jth sample in the ith group. We let µi denote the population mean of group i. In this notation, the ANOVA null hypothesis is:

H0:μ1=μ2=L=μk.

This hypothesis corresponds to the state where all of the means µi are equal to each other and hence do not differ. The alternative hypothesis in this case is:

HA:at least two μidiffer.

If the ANOVA test rejects H0 in favor of HA, this means there is enough evidence to conclude that the group means are truly different.

To accomplish this, ANOVA partitions the overall variance. The overall variance is simply the sample standard deviation squared of all of the data regardless of treatment group. In our notation, we would have:

S2=i=1kj=1n(yijy¯)2nk1,

where y¯=1nki=1kj=1nyij is the overall mean. In our notation, y¯i=1nj=1nyij is the sample mean for the ith group. Here, the denominator is not useful in partitioning the groups and will be discarded to create the sum of squares total, denoted by SSTO and is given by:

SSTO=i=1kj=1n(yijy¯)2

By simply adding and subtracting the group sample mean in the SSTO and doing some algebra (some algebra details have been omitted), one can obtain:

SSTO=i=1kj=1n(yijy¯i+y¯iy¯)2=i=1kj=1n[(yijy¯i)2+2(yijy¯i)(y¯iy¯)+(y¯iy¯)2]==i=1kj=1n(yijy¯i)2Error+i=1kn(y¯iy¯)2Treatment=SSE+SST.

Notice by doing this, the SSTO can be expressed as the sum of a term associated with error and a term associated with the treatment group. This is the essence of ANOVA, partitioning the SSTO into meaningful components. Furthermore, each of the components is itself a sum of items that are squared; hence, the names sum of squares error, SSE, and sum of squares treatment, SST are often given to the components. Note that in this one-way ANOVA scenario, SST is often called the sum of squares between and the SSE is often called the sum of squares within and are denoted by SSB and SSW, respectively.

Similarly, for the one-way ANOVA case with equal sample sizes, the total degrees of freedom, dfTO = nk − 1, associated with the SSTO can also be partitioned into degrees of freedom error, dfE = n(k – 1), and degrees of freedom for treatment, dfT = k – 1. As with the sum of squares, the degrees of freedom also add together nicely dfTO = dfE + dfT.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading