Skip to main content icon/video/no-internet

Data swapping, first introduced by Tore Dalenius and Steven Reiss in the late 1970s, is a perturbation method used for statistical disclosure control. The objective of data swapping is to reduce the risk that anyone can identify a respondent and his or her responses to questionnaire items by examining publicly released microdata or tables while preserving the amount of data and its usefulness.

In general, the data swapping approach is implemented by creating pairs of records with similar attributes and then interchanging identifying or sensitive data values among the pairs. For a simplistic example, suppose two survey respondents form a "swapping pair" by having the same age. Suppose income categories are highly identifiable and are swapped to reduce the chance of data disclosure. The first respondent makes between $50,000 and $60,000 annually, and the other makes between $40,000 and $50,000. After swapping, the first respondent is assigned the income category of $40,000 to $50,000, and the second respondent is assigned $50,000 to $60,000.

One benefit of data swapping is that it maintains the unweighted univariate distribution of each variable that is swapped. However, bias is introduced in univariate distributions if the sampling weights are different between the records of each swapping pair. One can imagine the impact on summaries of income categories if, in the example given, one survey respondent has a weight of 1, while the other has a weight of 1,000. A well-designed swapping approach incorporates the sampling weights into the swapping algorithm in order to limit the swapping impact on univariate and multivariate statistics.

There are several variations of data swapping, including (a) directed swapping, (b) random swapping, and (c) rank swapping.

Directed swapping is a nonrandom approach in which records are handpicked for swapping. For instance, a record can be identified as having a high risk of disclosure, perhaps as determined through a matching operation with an external file, and then chosen for swapping. Random swapping occurs when all data records are given a probability of selection and then a sample is selected using a random approach. The sampling can be done using any approach, including simple random sampling, probability proportionate to size sampling, stratified random sampling, and so on. Once the target records are selected, a swapping partner is found with similar attributes. The goal is to add uncertainty to all data records, not just those that can be identified as having a high risk of disclosure, since there is a chance that not all high-risk records identified for directed swapping cover all possible high-risk situations. Finally, rank swapping is a similar method that involves the creation of pairs that do not exactly match on the selected characteristics but are close in the ranking of the characteristics. This approach was developed for swapping continuous variables.

The complexities of sample surveys add to the challenge of maintaining the balance of reducing disclosure risk and maintaining data quality. Multi-stage sample designs with questionnaires at more than one level (i.e. prisons, inmates) give rise to hierarchical data releases that may require identity protection for each file. Longitudinal studies sometimes involve adding new samples and/or new data items over the course of several data collections. Data swapping may be incorporated in longitudinal studies to ensure that all newly collected data are protected. Also in survey sampling, data-swapping strategies incorporate sampling weights by forming swapping partners that minimize or reduce the amount of bias introduced through the swapping process.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading