Cluster Sample

Mansour Fahimi

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Cluster Sample

By: Mansour Fahimi
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n67
Subject:Anthropology, Business and Management, Criminology and Criminal Justice, Communication and Media Studies, Economics, Education, Geography, Health, Marketing, Nursing, Political Science and International Relations, Psychology, Social Policy and Public Policy, Social Work, Sociology

Request Permissions

Show page numbers Hide page numbers

Unlike stratified sampling, where the available information about all units in the target population allows researchers to partition sampling units into groups (strata) that are relevant to a given study, there are situations in which the population (in particular, the sampling frame) can only identify pre-determined groups or clusters of sampling units. Conducive to such situations, a cluster sample can be denned as a simple random sample in which the primary sampling units consist of clusters. As such, effective clusters are those that are heterogeneous within and homogenous across, which is a situation that reverses when developing effective strata. In area probability sampling, particularly when face-to-face data collection is considered, cluster samples are often used to reduce the amount of geographic dispersion of the sample units that can otherwise result from applications of unrestricted sampling methods, such as simple or systematic random sampling. This is how cluster samples provide more information per unit cost as compared to other sample types. Consequently,

cluster sampling is typically a method of choice used when it is impractical to obtain a complete list of all sampling units across the population of interest, or when for cost reasons the selected units are to be confined to a limited sample of clusters. That is, feasibility and economy are the two main reasons why cluster samples are used in complex surveys of individuals, institutions, or items. Operationally, clusters can be denned as collection of units that are geographic, temporal, or spatial in nature. For instance, counties or census blocks often serve as geographic clusters for households sampling; calendar years or months are used for temporal clustering; while boxes of components or plots of land are examples of spatial clusters of objects. Depending on the nature of a study and the extent of heterogeneity among units within each cluster, different numbers of clusters might be needed to secure reliable estimates from a cluster sample. When units within all clusters display the same variability with respect to the measure of interest as the target population as a whole, reasonable estimates can be generated from a small number of clusters. In contrast, when variability is small within but large across clusters, a larger number of clusters of smaller size might be needed to ensure stability.

In spite of feasibility and economical advantages of cluster samples, for a given sample size cluster sampling generally provides estimates that are less precise compared to what can be obtained via simple or stratified random samples. The main reason for this loss in precision is the inherent homogeneity of sampling units within selected clusters, since units in a given cluster are often physically close and tend to have similar characteristics. That is, selection of more than one unit within the same cluster can produce redundant information—an inefficiency leading to higher standard errors for survey estimates.

Kish provided a model for estimating the inflation in standard errors due to clustering. Accordingly, this multiplicative clustering design effect, deff, can be estimated by

In the preceding formulation,

represents the average cluster size and p (rho) denotes the so-called intraclass correlation, which is an estimate of relative homogeneity within clusters measured with respect to key analytical objectives of the survey. Obviously, the above effect approaches unity (or no effect) when the average cluster size approaches 1—that is, when the design approaches simple random sampling with no clustering. When p becomes exceedingly large due to high correlation between sampling units within clusters, it becomes exceedingly less efficient to select more than one unit from each cluster. Stated differently, even a relatively moderate measure of intraclass correlation can have a sizable inflationary effect on the standard errors when the average cluster size is large.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Cluster Sample

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends