In this guide, you will learn how to produce a two-way Analysis of Variance (ANOVA) in Stata using a practical example to illustrate the process. You will find links to the example dataset and you are encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have already opened the data file in Stata.
Two-way ANOVA is a method used to test whether the mean of a continuous variable differs across subsets of the data as defined by two categorical variables. For example, you might explore whether the average weight of people differs depending on which region of the country they live in, whether they are male or female, and the combination of the two. In this way, two-way ANOVA allows researchers to explore whether a continuous variable (e.g., weight) and two categorical variables (e.g., region and gender) are related to each other.
This example uses a subset of data from 2015 drawn from the 2015 Fuel Consumption Report from Natural Resources Canada. This extract includes data from 1,082 automobiles. The three variables we examine are:
The variable fuelusecity ranges between 4.5 and 30.60 in this sample dataset, with a mean of 12.53. The variable fuel divides automobile fuel into four categories:
The variable trans2 divides transmission type into two categories:
The city fuel consumption rate is a continuous variable, while both the type of fuel and transmission variables are categorical variables. This makes this set of variables appropriate for a two-way ANOVA.
Before conducting a two-way ANOVA, we should first look at each variable in isolation. We start by considering summary statistics for our continuous variable fuelusecity using the summarize command. Enter the following command in the Stata Command window:
Press Enter to produce summary statistics detailing the number of observations, mean, standard deviation, and range for the continuous variable.
Next, we present a histogram of city fuel consumption. This can be created in Stata by entering the following command in the Command window:
Press Enter to produce a histogram. By default, Stata will produce a density histogram. To select frequency, enter the following command instead:
histogram fuelusecity, frequency
Alternatively, you can create a histogram by selecting options from the Menu as follows:
Graphics → Histogram
In the “histogram - Histograms for continuous and categorical variables” dialog box that opens, you will see a text box labeled “Variable” in the upper left-hand corner. Use the drop-down menu to select fuelusecity from the list of variables. To the right of the “Variable” box, you will see two buttons asking you to specify whether data are discrete or continuous. Ensure that the “Data are continuous” option has been selected. In the lower right-hand corner under “Y axis,” select “Frequency.” Click OK to perform the analysis.
We should also produce frequency distributions for the categorical variables fuel and trans2. This is done in Stata using the tabulate command. To produce a frequency distribution for type of fuel, enter the following command in the Stata Command window:
Press Enter to produce the frequency distribution.
Alternatively, you can achieve the same results by selecting the following options from the menu:
Statistics → Summaries, tables and tests → Frequency tables → One-way table
In the dialog box that opens, you will see a text box labeled “Categorical variable” in the upper left-hand corner. Use the drop-down menu to select fuel from the list of variables. Click OK to produce the frequency distribution.
The same procedure should be repeated to produce a frequency distribution of transmission type, replacing the variable fuel with trans2.
Screenshots for the procedures to produce histograms and frequency distributions in Stata are available in the How-to Guides for the Dispersion of a Continuous Variable and Frequency Distribution topics, respectively.
You estimate a two-way ANOVA model in Stata by entering the anova command followed by the continuous variable fuelusecity, the categorical variables fuel and trans2, and an interaction between the two categorical variables fuel#trans2. Enter the following command in the Command window:
anova fuelusecity fuel trans2 fuel#trans2
Press Enter to perform the analysis.
Again, you can perform the analysis by using the menu options instead. Select the following options from the menu:
Statistics → Linear models and related → ANOVA/MANOVA → Analysis of variance and covariance
At the top left-hand side of the “anova - Analysis of variance and covariance” dialog box that opens, you will see a text box labeled “Dependent variable.” Select fuelusecity from the drop-down options. In the text box beneath it labeled “Model,” choose fuel and trans2 from the list of variables provided. Figure 1 shows what the dialog box looks like in Stata.
Two-way ANOVA allows researchers to assess the main effect of each categorical variable on the continuous variable in question and the interactive effect of the two categorical variables combined. The interaction between fuel and trans2 must be added to the model. To do this, click on the button with three dots to the side of the text box labeled “Model.”
A second dialog box, “Create varlist with factor variables” will open. At the top, in the “Type of variable” section, “Factor variable” should be selected. In the text box labeled “Specification,” select “Interaction (2-way)” from the list of options. Clicking on this will produce two “Variables” text boxes. Select the two categorical variables fuel and trans2 and click Add to varlist in the lower right-hand corner.
You will see that the new interaction variable fuel#trans2 appears in the “Varlist” text box at the bottom.
Click OK to return to the “anova - Analysis of variance and covariance” dialog box, then click OK again to perform the analysis.
Figure 2 shows what the dialog box looks like in Stata.
The histogram of fuelusecity is presented in Figure 3. The values are clustered around the mean of 12.10. There are a few extreme values, with the largest being 30.6 liters per 100 km. Researchers may want to explore whether cases with these extreme values have undue influence on the analysis.
The Stata output for the frequency distribution of the variable fuel is presented in Figure 4. The table shows that of the four types of automobile fuel in the sample there are more than 400 each of Regular Gas and Premium Gas, respectively, while the remaining two types constitute close to 100 observations in the dataset.
The Stata output for the frequency distribution of the variable trans2 is presented in Figure 5. It shows that most of the automobiles in the sample have an automatic transmission (877). The rest in the sample have a manual transmission (205).
The Stata output for the two-way ANOVA is presented in Figure 6.
There are a lot of results reported in the table in Figure 6. Here, we focus on the various F-tests and their associated levels of statistical significance, or p values. The results show that there is evidence for a statistically significant main effect of type of fuel, a statistically significant main effect of transmission type, and a statistically significant interactive effect of fuel type and transmission type on city fuel consumption. These conclusions are based on reported p values for each test being less than .05.
Download this sample dataset to see whether you can replicate these results. The sample dataset also includes a measure of the highway fuel consumption, named fuelusehwy. See whether you can reproduce the results presented here, and try producing your own two-way ANOVA using this other fuel consumption rate measure in place of the city fuel consumption rate used for this example.