A survey may be a census of the universe (the study population) or may be conducted with a sample that represents the universe. Either a census or a sample survey requires a sampling frame. For a census, the frame will consist of a list of all the known units in the universe, and each unit will need to be surveyed. For a sample survey, the frame represents a list of the target population from which the sample is selected. Ideally it should contain all elements in the population, but oftentimes these frames do not.
The quality of the sample and, to an extent, of the survey itself depends on the quality of the sampling frame. Selecting a sampling frame that is of high quality and appropriate both to the population being studied and to the data collection method is a key step in planning a survey. In selecting a sample frame, three questions can be asked: (1) Does it include members of the universe being studied? (2) Is it appropriate for the way the data will be collected? and (3) What is the quality of the frame in terms of coverage, completeness, and accuracy?
Major categories of sampling frames are area frames for in-person interviews, random-digit dialing (RDD) frames for telephone survey samples, and a variety of lists used for all types of surveys. Few lists that are used as sampling frames were created specifically for that use. Exceptions are commercially available RDD frames.
The type of frame usually varies with the mode of interviewing, although many frames can be used for multiple modes. Some studies employ multiple frames, either because they use multiple modes of data collection, because no single frame has adequate coverage, or to facilitate oversampling of certain groups.
An in-person survey of households (or individuals living in households) may use multiple levels of frames: an area frame to select a sample of areas where interviews are conducted, and within the areas, lists of addresses compiled by field staff or obtained from commercial sources.
Telephone household surveys may employ RDD frames, directory-based frames, or a combination. Telephone surveys of businesses often use frames developed from telephone directories. Telephone surveys can also use as sampling frames lists from many sources, including government agencies, commercial vendors of lists, associations, and societies. Some of these lists are publicly available, and some can be used only when doing studies for the owner of the list. Examples of publicly available lists include lists of public school districts and schools maintained by the National Center for Education Statistics (there are also commercial frames of districts and schools) and lists of physicians maintained by the American Medical Association. Lists whose use is restricted include those of recipients of government assistance and customers of businesses.
Surveys conducted by regular mail or email often use as frames the same lists (mentioned in the [Page 791]previous paragraph) for telephone surveys. Web surveys could also use these lists as means to contact respondents via regular mail and request that they complete a questionnaire online. Another type of frame for Web surveys comprises one or more Web portals (Web sites that provide links to other Web sites).
Ideally, the sampling frame will list every member of the study population once, and only once, and will include only members of the study population. The term coverage refers to the extent to which these criteria are met. In addition, the frame should be complete in terms of having information needed to select the sample and conduct the survey, and the information on the frame should be accurate.
Needless to say, almost no sampling frame is perfect. Examining the quality of a frame using the criteria discussed in this section may lead to looking for an alternative frame or to taking steps to deal with the frame's shortcomings.
Problems in frame coverage include both under-coverage and overcoverage. Undercoverage means that some members of the universe are neither on the frame nor represented on it. Some examples of undercoverage are the following:
- All RDD landline frames exclude households with no telephone service, and those with only cellular phone service.
- Frames drawn from telephone directories exclude those households (listed in #1 above) plus those with unpublished and recently published numbers.
- New construction may be excluded from lists of addresses used as sampling frames for surveys conducted by mail or personal visit.
- Commercial lists of business establishments exclude many new businesses and may underrepresent small ones.
Frames can also suffer from undercoverage introduced by self-selection bias, as in the case of "panels" recruited for Internet research, even if the panels were recruited from a survey that used a probability sample with a good frame.
Overcoverage means that some elements on the frame are not members of the universe. For example, RDD frames contain nonworking and business telephone numbers, as well as household numbers. A frame may have both undercoverage and overcoverage. For example, to select a sample of students enrolled in a school, one might use a list provided by the school or the district; however, the list might include students who had dropped out or transferred and omit students who had enrolled after the list was compiled.
Frame undercoverage can lead to bias in estimates made from survey data. Overcoverage can lead to bias if ineligible units on the frame are not identified. However, the larger problem with overcoverage is usually one of cost, because ineligibles must be identified and screened out. If the ineligibles can be identified before selecting the sample, it is usually better to eliminate them at that time.
An issue related to coverage is that of duplicates on the frame, which can lead to units having unequal chances of selection. It is best to eliminate duplicates before selecting the sample. If this cannot be done, then the presence of duplicates should be determined for those units that are sampled, so the sample can be properly weighted.
In addition to issues of coverage, a sampling frame should have information that is complete and accurate. For a sampling frame to be complete, it must have enough information so that the sampled units can be identified and located. Further, this information should be accurate. Missing or inaccurate information on the frame can affect the survey's response rate and data collection costs.