Sampling occurs when researchers examine a portion or sample of a larger group of potential participants and use the results to make statements that apply to this broader group or population. The extent to which the research findings can be generalized or applied to the larger group or population is an indication of the external validity of the [Page 1303]research design. The process of choosing/selecting a sample is an integral part of designing sound research. An awareness of the principles of sampling design is imperative to the development of research with strong external validity. In theory, a sound sampling method will result in a sample that is free from bias (each individual in the population has an equal chance of being selected) and is reliable (a sample will yield the same or comparable results if the research were repeated).
A sample that is free from bias and reliable is said to be representative of the entire population of interest. A representative sample adequately reflects the properties of interest of the population being examined, thus enabling the researcher to study the sample but draw valid conclusions about the larger population of interest. If the sampling procedures are flawed, then the time and effort put into data collection and analysis can lead to erroneous inferences. A poor sample could lead to meaningless findings based on research that is fundamentally flawed. Researchers use sampling procedures to select units from a population. In social science research, the units being selected are commonly individuals, but they can also be couples, organizations, groups, cities, and so on.
This entry begins by detailing the steps in the sampling process. Next, this entry describes the types of sampling. The entry ends with a discussion of evaluating sampling and determining sample size.
The sampling process can be diagrammed as shown in Figure 1.
Identifying the population or entire group of interest is an important first step in designing the sampling method. This entire population is often referred to as the theoretical or target population because it includes all of the participants of theoretical interest to the researcher. These are the individuals about which the researcher is interested in making generalizations. Examples of possible theoretical populations are all high school principals in the United States, all couples over age 80 in the world, and all adults with chronic fatigue syndrome. It is hardly ever possible to study the entire theoretical population, so a portion of this theoretical population that is accessible (the accessible population or sampling frame) is identified. Researchers define the accessible population/sampling frame based on the participants to which they have access. Examples of accessible populations might be the high school principals in the state of Colorado, couples over age 80 who participate in a community activity targeting seniors, or patients who have visited a particular clinic for the treatment for chronic fatigue syndrome. From this accessible population, the researcher might employ a sampling design to create the selected sample, which is the smaller group of individuals selected from the accessible population. These individuals are asked by the researcher to participate in the study. For example, one might sample high school principals by selecting a random sample of 10 school districts within the state of Colorado. In other cases, the accessible population might be small enough that the researcher selects all these individuals as the selected “sample.” For example, a researcher studying couples older than age 80 who participate in a particular community activity could choose to study all the couples in that group rather than only some of them. In this case, the accessible population and the selected sample are the same. A third example of an accessible population could be if the patients treated during a certain 3-month time period were chosen as the selected sample from the accessible population of all individuals seeking treatment for chronic fatigue syndrome at a particular clinic.
Finally, the researcher has the actual sample, which is composed of the individuals who agree to participate and whose data are actually used in the analysis. For example, if there were 50 older couples at the community activity, perhaps only 30 (a 60% response rate) would send back the questionnaire. [Page 1304]The advantages to using a sample rather than examining a whole population include cost-effectiveness, speed, convenience, and potential for improved quality and reliability of the research. Designing a sound sampling procedure takes time and the cost per individual examined might be higher, but the overall cost of the study is reduced when a sample is selected. Studying a sample rather than the entire population results in less data that need to be collected, which can thereby produce a shorter time lag and better quality data. Finally, by examining a sample as opposed to an entire population, the researcher might be able to examine a greater scope of variables and content than would otherwise be allowed.
Although there are advantages to using sampling, there are some cautions to note. The virtues mentioned previously can also end up being limitations if the sampling procedures do not produce a representative sample. If a researcher is not prudent in selecting and designing a sampling method, then a biased and/or unreliable sample might be produced. Knowledge of what biases might potentially result from the choice of sampling method can be elusive. The errors can be especially large when sample observations falling within a single cell are small. For example, if one is interested in making comparisons among high school principals of various ethnic groups, then the sampling process described previously might prove problematic. A random sample of 10 Colorado school districts might produce few principals who were ethnic minorities because proportionally there are relatively few large urban districts that are more likely to have minority principals. It is possible that none of these districts would be randomly selected.
When each individual in a population is studied, a census rather than a sample is used. In this instance, the accessible population contains the sample individuals as does the selected sample. When sampling methods are employed, researchers have two broad processes to choose from: probability sampling and nonprobability sampling. If using probability sampling, the researcher must have a list of and access to each individual in the accessible population from which the sample is being drawn. Furthermore, each member of the population must have a known, nonzero chance of being selected. Nonprobability sampling, in contrast, is used when the researcher does not have access to the whole accessible population and cannot state the likelihood of an individual being selected for the sample.
Probability sampling is more likely to result in a representative sample and to meet the underlying mathematical assumptions of statistical analysis. The use of some kind of random selection process increases the likelihood of obtaining a representative sample. Simple random sampling is the most basic form of probability sampling in which a random number table or random number generator is used to select participants from a list or sampling frame of the accessible population. Stratified random sampling enables the researcher to divide the accessible population on some important characteristic, like geographical region. In this way, each of these stratum, or segments of the population, can be studied independently. Other types of probability sampling include systematic sampling and cluster sampling. The earlier example involving the selection of high school principals used a two-stage cluster sampling procedure, first randomly selecting, for example, 10 school districts then interviewing all the principals in those 10 districts. This procedure would make travel for the observation of principals much more feasible, while still selecting a probability sample.
Probability samples, although considered preferable, are not always practical or feasible. In these cases, nonprobability sampling designs are used. As the name would imply, nonprobability sampling designs do not involve random selection; therefore, not everyone has an equal chance of selection. This does not necessarily lead to a sample that is not representative of the population. The level of representativeness, however, is difficult to determine.
A convenience sampling design is employed when the researcher uses individuals who are [Page 1305]readily available. In quota sampling, quotas are set for the number of participants in each particular category to be included in the study. The categories chosen differ depending on what is being studied but, for example, quotas might be set on the number of men and women or employed or unemployed individuals.
A variety of steps in the sampling process might lead to an unrepresentative sample. First is the researcher's selection of the accessible population. Most often, the accessible population is chosen because it is the group that is readily available to the researcher. If this accessible population does not mirror the theoretical population in relation to the variables of interest, the representativeness of the resulting sample is compromised. The researcher's choice of sampling design is another entry point for error. It is not very likely that a nonprobability sample is representative of the population, and it is not possible to measure the extent to which it differs from the population of interest. A poor response rate and participant attrition can also lead to a unrepresentative sample. The effects of these nonresponses can be dramatic if there are systematic differences between those individuals who did and did not respond or drop out. Often, there is no way to know to what extent the resulting sample is biased because these individuals might differ from those who responded in important ways.
There is no single straightforward answer in relation to questions regarding how large a sample should be to be representative of the entire population of interest. Calculating power is the technically correct way to plan ahead of time how many participants are needed to detect a result of a certain effect size. The underlying motivation behind selecting a sampling method is the desire for a sample that is representative of the population of interest. The size of the selected sample depends partly on the extent to which the population varies with regard to the key characteristics being examined. Sometimes, if the group is fairly homogeneous on the characteristics of interest, one can be relatively sure that a small probability sample is representative of the entire population. If people are very similar (as in a selected interest group, individuals with a certain syndrome, etc.), one might suspect the sample size would not need to be as large as would be needed with a diverse group. The level of accuracy desired is another factor in determining the sample size. If the researcher is willing to tolerate a higher level of error for the sake of obtaining the results quickly, then he/she might choose to use a smaller sample that can be more quickly studied. A researcher must also consider the research methodology chosen when determining the sample size. Some methodologies, such as mailed surveys, have lower typical response rates. To obtain an actual sample of sufficient size, a researcher must take the anticipated response rate into consideration. Practical considerations based on time and money are probably the most common deciding factors in determining sample size.