In this guide, you will learn how to produce estimates, standard errors, and confidence intervals for some population parameters using a sample that has been drawn using stratification and differing sampling rates between strata. The example focuses on means and proportions, but the principles demonstrated apply to a variety of estimators, including statistical modeling. The example assumes that you have already opened the data file in Stata.
A stratified random sample is obtained by dividing the population units into a set of mutually exclusive and exhaustive groups (strata), then drawing a sample of units from each stratum. Stratified sampling is preferred over simple random sampling for one (or more) of four reasons:
Regardless of the motivation for using stratification, parameter estimates, standard errors, and hypothesis test, results are obtained the same way, which differs from methods that assume simple random sampling.
This example presents mean and proportion estimation, with standard errors and confidence limits, using a dataset with a stratified sampling design. Statistics for three variables are obtained:
Each item uses a response scale with five options:
for which we obtain proportions.
For each item, we created a numeric variable valued in minutes (c2n c5n c8n):
from which we obtain means. The sampling design is stratified with unequal selection probability between strata so weight (inv_prob) and stratum (localcouncil) variables are in the dataset.
In Stata, information about sampling design is made part of the dataset through the svyset command. The svyset command gives Stata the four pieces of information necessary to produce proper point estimates and standard errors from survey data:
The syntax for the svyset command is:
svyset psuvar [pweight=wgtvar], strata(stratvar) fpc(fpcvar)
where psuvar is the primary sampling unit indicator variable, wgtvar is the sampling weight variable, stratvar is the stratum indicator variable, and fpcvar is the population total (within strata) variable. psuvar is required, all others are optional. In this dataset, the primary and final sampling units are the same (household), and there is no finite population correction. The stratum indicator variable is localcouncil and the sampling weight variables is inv_prob. The svyset command is:
svyset _n [pweight=inv_prob], strata(localcouncil)
psuvar is required. In this example, the primary and final sampling units are the same, household. As there is one line of data per household, _n, Stata’s automatic data line indicator variable is used as the psuvar.
A wide variety of commands that use survey data are available in Stata. For this example, we use the mean and proportion commands. To run a command and make use of the complex sampling design information, the command is preceded by svy:. Otherwise the syntax is as usual for the command. For example, to obtain means, we use the mean command:
svy: mean c2n c5n c8n
for proportions, we use the proportion command:
svy: proportion c2 c5 c8
A list of Stata commands that support the svy: prefix is available at https://www.stata.com/manuals13/svysvyestimation.pdf
When the svy: prefix is invoked, the output looks very similar to that users are accustomed to. For example, the output for the mean command without using svy: is:
With the svy: prefix:
For proportions, without the svy: prefix:
With the svy: prefix:
You can download this sample dataset along with a guide showing how to obtain mean and percent estimates using stratification information. The sample includes ordered nominal and quantitative variable, so a variety of statistical models may be tried. See whether you can reproduce the results presented here as well as obtain estimates using other Stata commands with which you are familiar.