## Entry

## Reader's guide

## Entries A-Z

## Subject index

# Sampling

Sampling occurs when researchers examine a portion or sample of a larger group of potential participants and use the results to make statements that apply to this broader group or population. The extent to which the research findings can be generalized or applied to the larger group or population is an indication of the external validity of the research design. The process of choosing/selecting a sample is an integral part of designing sound research. An awareness of the principles of sampling design is imperative to the development of research with strong external validity. In theory, a sound sampling method will result in a sample that is free from bias (each individual in the population has an equal chance of being selected) and is reliable (a sample will yield the same or comparable results if the research were repeated).

A sample that is free from bias and reliable is said to be representative of the entire population of interest. A representative sample adequately reflects the properties of interest of the population being examined, thus enabling the researcher to study the sample but draw valid conclusions about the larger population of interest. If the sampling procedures are flawed, then the time and effort put into data collection and analysis can lead to erroneous inferences. A poor sample could lead to meaningless findings based on research that is fundamentally flawed. Researchers use sampling procedures to select units from a population. In social science research, the units being selected are commonly individuals, but they can also be couples, organizations, groups, cities, and so on.

This entry begins by detailing the steps in the sampling process. Next, this entry describes the types of sampling. The entry ends with a discussion of evaluating sampling and determining sample size.

### Steps in the Sampling Process

The sampling process can be diagrammed as shown in Figure 1.

Identifying the population or entire group of interest is an important first step in designing the sampling method. This entire population is often referred to as the theoretical or target population because it includes all of the participants of theoretical interest to the researcher. These are the individuals about which the researcher is interested in making generalizations. Examples of possible theoretical populations are all high school principals in the United States, all couples over age 80 in the world, and all adults with chronic fatigue syndrome. It is hardly ever possible to study the entire theoretical population, so a portion of this theoretical population that is accessible (the accessible population or sampling frame) is identified. Researchers define the accessible population/sampling frame based on the participants to which they have access. Examples of accessible populations might be the high school principals in the state of Colorado, couples over age 80 who participate in a community activity targeting seniors, or patients who have visited a particular clinic for the treatment for chronic fatigue syndrome. From this accessible population, the researcher might employ a sampling design to create the selected sample, which is the smaller group of individuals selected from the accessible population. These individuals are asked by the researcher to participate in the study. For example, one might sample high school principals by selecting a random sample of 10 school districts within the state of Colorado. In other cases, the accessible population might be small enough that the researcher selects all these individuals as the selected “sample.” For example, a researcher studying couples older than age 80 who participate in a particular community activity could choose to study all the couples in that group rather than only some of them. In this case, the accessible population and the selected sample are the same. A third example of an accessible population could be if the patients treated during a certain 3-month time period were chosen as the selected sample from the accessible population of all individuals seeking treatment for chronic fatigue syndrome at a particular clinic.

### Figure 1 Sampling Process Diagram

Finally, the researcher has the actual sample, which is composed of the individuals who agree to participate and whose data are actually used in the analysis. For example, if there were 50 older couples at the community activity, perhaps only 30 (a 60% response rate) would send back the questionnaire. The advantages to using a sample rather than examining a whole population include cost-effectiveness, speed, convenience, and potential for improved quality and reliability of the research. Designing a sound sampling procedure takes time and the cost per individual examined might be higher, but the overall cost of the study is reduced when a sample is selected. Studying a sample rather than the entire population results in less data that need to be collected, which can thereby produce a shorter time lag and better quality data. Finally, by examining a sample as opposed to an entire population, the researcher might be able to examine a greater scope of variables and content than would otherwise be allowed.

Although there are advantages to using sampling, there are some cautions to note. The virtues mentioned previously can also end up being limitations if the sampling procedures do not produce a representative sample. If a researcher is not prudent in selecting and designing a sampling method, then a biased and/or unreliable sample might be produced. Knowledge of what biases might potentially result from the choice of sampling method can be elusive. The errors can be especially large when sample observations falling within a single cell are small. For example, if one is interested in making comparisons among high school principals of various ethnic groups, then the sampling process described previously might prove problematic. A random sample of 10 Colorado school districts might produce few principals who were ethnic minorities because proportionally there are relatively few large urban districts that are more likely to have minority principals. It is possible that none of these districts would be randomly selected.

### Types of Sampling

When each individual in a population is studied, a census rather than a sample is used. In this instance, the accessible population contains the sample individuals as does the selected sample. When sampling methods are employed, researchers have two broad processes to choose from: probability sampling and nonprobability sampling. If using probability sampling, the researcher must have a list of and access to each individual in the accessible population from which the sample is being drawn. Furthermore, each member of the population must have a known, nonzero chance of being selected. Nonprobability sampling, in contrast, is used when the researcher does not have access to the whole accessible population and cannot state the likelihood of an individual being selected for the sample.

### Probability Sampling

Probability sampling is more likely to result in a representative sample and to meet the underlying mathematical assumptions of statistical analysis. The use of some kind of random selection process increases the likelihood of obtaining a representative sample. Simple random sampling is the most basic form of probability sampling in which a random number table or random number generator is used to select participants from a list or sampling frame of the accessible population. Stratified random sampling enables the researcher to divide the accessible population on some important characteristic, like geographical region. In this way, each of these stratum, or segments of the population, can be studied independently. Other types of probability sampling include systematic sampling and cluster sampling. The earlier example involving the selection of high school principals used a two-stage cluster sampling procedure, first randomly selecting, for example, 10 school districts then interviewing all the principals in those 10 districts. This procedure would make travel for the observation of principals much more feasible, while still selecting a probability sample.

### Nonprobability Sampling

Probability samples, although considered preferable, are not always practical or feasible. In these cases, nonprobability sampling designs are used. As the name would imply, nonprobability sampling designs do not involve random selection; therefore, not everyone has an equal chance of selection. This does not necessarily lead to a sample that is not representative of the population. The level of representativeness, however, is difficult to determine.

A convenience sampling design is employed when the researcher uses individuals who are readily available. In quota sampling, quotas are set for the number of participants in each particular category to be included in the study. The categories chosen differ depending on what is being studied but, for example, quotas might be set on the number of men and women or employed or unemployed individuals.

### Evaluating Sampling

A variety of steps in the sampling process might lead to an unrepresentative sample. First is the researcher's selection of the accessible population. Most often, the accessible population is chosen because it is the group that is readily available to the researcher. If this accessible population does not mirror the theoretical population in relation to the variables of interest, the representativeness of the resulting sample is compromised. The researcher's choice of sampling design is another entry point for error. It is not very likely that a nonprobability sample is representative of the population, and it is not possible to measure the extent to which it differs from the population of interest. A poor response rate and participant attrition can also lead to a unrepresentative sample. The effects of these nonresponses can be dramatic if there are systematic differences between those individuals who did and did not respond or drop out. Often, there is no way to know to what extent the resulting sample is biased because these individuals might differ from those who responded in important ways.

### Sample Size

There is no single straightforward answer in relation to questions regarding how large a sample should be to be representative of the entire population of interest. Calculating power is the technically correct way to plan ahead of time how many participants are needed to detect a result of a certain effect size. The underlying motivation behind selecting a sampling method is the desire for a sample that is representative of the population of interest. The size of the selected sample depends partly on the extent to which the population varies with regard to the key characteristics being examined. Sometimes, if the group is fairly homogeneous on the characteristics of interest, one can be relatively sure that a small probability sample is representative of the entire population. If people are very similar (as in a selected interest group, individuals with a certain syndrome, etc.), one might suspect the sample size would not need to be as large as would be needed with a diverse group. The level of accuracy desired is another factor in determining the sample size. If the researcher is willing to tolerate a higher level of error for the sake of obtaining the results quickly, then he/she might choose to use a smaller sample that can be more quickly studied. A researcher must also consider the research methodology chosen when determining the sample size. Some methodologies, such as mailed surveys, have lower typical response rates. To obtain an actual sample of sufficient size, a researcher must take the anticipated response rate into consideration. Practical considerations based on time and money are probably the most common deciding factors in determining sample size.

### Further Readings

- Descriptive Statistics
- Distributions
- Graphical Displays of Data
- Hypothesis Testing
- p Value
- Alternative Hypotheses
- Beta
- Critical Value
- Decision Rule
- Hypothesis
- Nondirectional Hypotheses
- Nonsignificance
- Null Hypothesis
- One-Tailed Test
- Power
- Power Analysis
- Significance Level, Concept of
- Significance Level, Interpretation and Construction
- Significance, Statistical
- Two-Tailed Test
- Type I Error
- Type II Error
- Type III Error

- Important Publications
- “Coefficient Alpha and the Internal Structure of Tests”
- “Convergent and Discriminant Validation by the Multitrait–Multimethod Matrix”
- “Meta-Analysis of Psychotherapy Outcome Studies”
- “On the Theory of Scales of Measurement”
- “Probable Error of a Mean, The”
- “Psychometric Experiments”
- “Sequential Tests of Statistical Hypotheses”
- “Technique for the Measurement of Attitudes, A”
- “Validity”
- Aptitudes and Instructional Methods
- Doctrine of Chances, The
- Logic of Scientific Discovery, The
- Nonparametric Statistics for the Behavioral Sciences
- Probabilistic Models for Some Intelligence and Attainment Tests
- Statistical Power Analysis for the Behavioral Sciences
- Teoria Statistica Delle Classi e Calcolo Delle Probabilità

- Inferential Statistics
- Q-Statistic
- R2
- Association, Measures of
- Coefficient of Concordance
- Coefficient of Variation
- Coefficients of Correlation, Alienation, and Determination
- Confidence Intervals
- Margin of Error
- Nonparametric Statistics
- Odds Ratio
- Parameters
- Parametric Statistics
- Partial Correlation
- Pearson Product-Moment Correlation Coefficient
- Polychoric Correlation Coefficient
- Randomization Tests
- Regression Coefficient
- Semipartial Correlation Coefficient
- Spearman Rank Order Correlation
- Standard Error of Estimate
- Standard Error of the Mean
- Student's t Test
- Unbiased Estimator
- Weights

- Item Response Theory
- Mathematical Concepts
- Measurement Concepts
- Organizations
- Publishing
- Qualitative Research
- Reliability of Scores
- Research Design Concepts
- Aptitude-Treatment Interaction
- Cause and Effect
- Concomitant Variable
- Confounding
- Control Group
- Interaction
- Internet-Based Research Method
- Intervention
- Matching
- Natural Experiments
- Network Analysis
- Placebo
- Replication
- Research
- Research Design Principles
- Treatment(s)
- Triangulation
- Unit of Analysis
- Yoked Control Procedure

- Research Designs
- A Priori Monte Carlo Simulation
- Action Research
- Adaptive Designs in Clinical Trials
- Applied Research
- Behavior Analysis Design
- Block Design
- Case-Only Design
- Causal-Comparative Design
- Cohort Design
- Completely Randomized Design
- Cross-Sectional Design
- Crossover Design
- Double-Blind Procedure
- Ex Post Facto Study
- Experimental Design
- Factorial Design
- Field Study
- Group-Sequential Designs in Clinical Trials
- Laboratory Experiments
- Latin Square Design
- Longitudinal Design
- Meta-Analysis
- Mixed Methods Design
- Mixed Model Design
- Monte Carlo Simulation
- Nested Factor Design
- Nonexperimental Design
- Observational Research
- Panel Design
- Partially Randomized Preference Trial Design
- Pilot Study
- Pragmatic Study
- Pre-Experimental Designs
- Pretest–Posttest Design
- Prospective Study
- Quantitative Research
- Quasi-Experimental Design
- Randomized Block Design
- Repeated Measures Design
- Response Surface Design
- Retrospective Study
- Sequential Design
- Single-Blind Study
- Single-Subject Design
- Split-Plot Factorial Design
- Thought Experiments
- Time Studies
- Time-Lag Study
- Time-Series Study
- Triple-Blind Study
- True Experimental Design
- Wennberg Design
- Within-Subjects Design
- Zelen's Randomized Consent Design

- Research Ethics
- Research Process
- Clinical Significance
- Clinical Trial
- Cross-Validation
- Data Cleaning
- Delphi Technique
- Evidence-Based Decision Making
- Exploratory Data Analysis
- Follow-Up
- Inference: Deductive and Inductive
- Last Observation Carried Forward
- Planning Research
- Primary Data Source
- Protocol
- Q Methodology
- Research Hypothesis
- Research Question
- Scientific Method
- Secondary Data Source
- Standardization
- Statistical Control
- Type III Error
- Wave

- Research Validity Issues
- Bias
- Critical Thinking
- Ecological Validity
- Experimenter Expectancy Effect
- External Validity
- File Drawer Problem
- Hawthorne Effect
- Heisenberg Effect
- Internal Validity
- John Henry Effect
- Mortality
- Multiple Treatment Interference
- Multivalued Treatment Effects
- Nonclassical Experimenter Effects
- Order Effects
- Placebo Effect
- Pretest Sensitization
- Random Assignment
- Reactive Arrangements
- Regression to the Mean
- Selection
- Sequence Effects
- Threats to Validity
- Validity of Research Conclusions
- Volunteer Bias
- White Noise

- Sampling
- Cluster Sampling
- Convenience Sampling
- Demographics
- Error
- Exclusion Criteria
- Experience Sampling Method
- Nonprobability Sampling
- Population
- Probability Sampling
- Proportional Sampling
- Quota Sampling
- Random Sampling
- Random Selection
- Sample
- Sample Size
- Sample Size Planning
- Sampling
- Sampling and Retention of Underrepresented Groups
- Sampling Error
- Stratified Sampling
- Systematic Sampling

- Scaling
- Software Applications
- Statistical Assumptions
- Statistical Concepts
- Autocorrelation
- Biased Estimator
- Cohen's Kappa
- Collinearity
- Correlation
- Criterion Problem
- Critical Difference
- Data Mining
- Data Snooping
- Degrees of Freedom
- Directional Hypothesis
- Disturbance Terms
- Error Rates
- Expected Value
- Fixed-Effects Model
- Inclusion Criteria
- Influence Statistics
- Influential Data Points
- Intraclass Correlation
- Latent Variable
- Likelihood Ratio Statistic
- Loglinear Models
- Main Effects
- Markov Chains
- Method Variance
- Mixed- and Random-Effects Models
- Models
- Multilevel Modeling
- Odds
- Omega Squared
- Orthogonal Comparisons
- Outlier
- Overfitting
- Pooled Variance
- Precision
- Quality Effects Model
- Random-Effects Models
- Regression Artifacts
- Regression Discontinuity
- Residuals
- Restriction of Range
- Robust
- Root Mean Square Error
- Rosenthal Effect
- Serial Correlation
- Shrinkage
- Simple Main Effects
- Simpson's Paradox
- Sums of Squares

- Statistical Procedures
- Accuracy in Parameter Estimation
- Analysis of Covariance (ANCOVA)
- Analysis of Variance (ANOVA)
- Barycentric Discriminant Analysis
- Bivariate Regression
- Bonferroni Procedure
- Bootstrapping
- Canonical Correlation Analysis
- Categorical Data Analysis
- Confirmatory Factor Analysis
- Contrast Analysis
- Descriptive Discriminant Analysis
- Discriminant Analysis
- Dummy Coding
- Effect Coding
- Estimation
- Exploratory Factor Analysis
- Greenhouse–Geisser Correction
- Hierarchical Linear Modeling
- Holm's Sequential Bonferroni Procedure
- Jackknife
- Latent Growth Modeling
- Least Squares, Methods of
- Logistic Regression
- Mean Comparisons
- Missing Data, Imputation of
- Multiple Regression
- Multivariate Analysis of Variance (MANOVA)
- Pairwise Comparisons
- Path Analysis
- Post Hoc Analysis
- Post Hoc Comparisons
- Principal Components Analysis
- Propensity Score Analysis
- Sequential Analysis
- Stepwise Regression
- Structural Equation Modeling
- Survival Analysis
- Trend Analysis
- Yates's Correction

- Statistical Tests
- F Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- z Test
- Bartlett's Test
- Behrens–Fisher t′ Statistic
- Chi-Square Test
- Duncan's Multiple Range Test
- Dunnett's Test
- Fisher's Least Significant Difference Test
- Friedman Test
- Honestly Significant Difference (HSD) Test
- Kolmogorov-Smirnov Test
- Kruskal–Wallis Test
- Mann–Whitney U Test
- Mauchly Test
- McNemar's Test
- Multiple Comparison Tests
- Newman–Keuls Test and Tukey Test
- Omnibus Tests
- Scheffé Test
- Sign Test
- Tukey's Honestly Significant Difference (HSD)
- Welch's t Test
- Wilcoxon Rank Sum Test

- Theories, Laws, and Principles
- Bayes's Theorem
- Central Limit Theorem
- Classical Test Theory
- Correspondence Principle
- Critical Theory
- Falsifiability
- Game Theory
- Gauss–Markov Theorem
- Generalizability Theory
- Grounded Theory
- Item Response Theory
- Occam's Razor
- Paradigm
- Positivism
- Probability, Laws of
- Theory
- Theory of Attitude Measurement
- Weber–Fechner Law

- Types of Variables
- Validity of Scores

- Loading...