Data Swapping

Thomas Krenzke

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Data Swapping

By: Thomas Krenzke
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n124
Subject:Anthropology, Business and Management, Criminology and Criminal Justice, Communication and Media Studies, Economics, Education, Geography, Health, Marketing, Nursing, Political Science and International Relations, Psychology, Social Policy and Public Policy, Social Work, Sociology

Request Permissions

Show page numbers Hide page numbers

Data swapping, first introduced by Tore Dalenius and Steven Reiss in the late 1970s, is a perturbation method used for statistical disclosure control. The objective of data swapping is to reduce the risk that anyone can identify a respondent and his or her responses to questionnaire items by examining publicly released microdata or tables while preserving the amount of data and its usefulness.

In general, the data swapping approach is implemented by creating pairs of records with similar attributes and then interchanging identifying or sensitive data values among the pairs. For a simplistic example, suppose two survey respondents form a "swapping pair" by having the same age. Suppose income categories are highly identifiable and are swapped to reduce the chance of data disclosure. The first respondent makes between $50,000 and $60,000 annually, and the other makes between $40,000 and $50,000. After swapping, the first respondent is assigned the income category of $40,000 to $50,000, and the second respondent is assigned $50,000 to $60,000.

One benefit of data swapping is that it maintains the unweighted univariate distribution of each variable that is swapped. However, bias is introduced in univariate distributions if the sampling weights are different between the records of each swapping pair. One can imagine the impact on summaries of income categories if, in the example given, one survey respondent has a weight of 1, while the other has a weight of 1,000. A well-designed swapping approach incorporates the sampling weights into the swapping algorithm in order to limit the swapping impact on univariate and multivariate statistics.

There are several variations of data swapping, including (a) directed swapping, (b) random swapping, and (c) rank swapping.

Directed swapping is a nonrandom approach in which records are handpicked for swapping. For instance, a record can be identified as having a high risk of disclosure, perhaps as determined through a matching operation with an external file, and then chosen for swapping. Random swapping occurs when all data records are given a probability of selection and then a sample is selected using a random approach. The sampling can be done using any approach, including simple random sampling, probability proportionate to size sampling, stratified random sampling, and so on. Once the target records are selected, a swapping partner is found with similar attributes. The goal is to add uncertainty to all data records, not just those that can be identified as having a high risk of disclosure, since there is a chance that not all high-risk records identified for directed swapping cover all possible high-risk situations. Finally, rank swapping is a similar method that involves the creation of pairs that do not exactly match on the selected characteristics but are close in the ranking of the characteristics. This approach was developed for swapping continuous variables.

The complexities of sample surveys add to the challenge of maintaining the balance of reducing disclosure risk and maintaining data quality. Multi-stage sample designs with questionnaires at more than one level (i.e. prisons, inmates) give rise to hierarchical data releases that may require identity protection for [Page 181]each file. Longitudinal studies sometimes involve adding new samples and/or new data items over the course of several data collections. Data swapping may be incorporated in longitudinal studies to ensure that all newly collected data are protected. Also in survey sampling, data-swapping strategies incorporate sampling weights by forming swapping partners that minimize or reduce the amount of bias introduced through the swapping process.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Data Swapping

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends