Record Linkage

Neil J.Salkind

doi:10.4135/9781412952644

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Record Linkage

Edited by:
Neil J. Salkind
In:Encyclopedia of Measurement and Statistics
Chapter DOI:https://doi.org/10.4135/9781412952644.n378
Subject:Anthropology, Business and Management, Criminology and Criminal Justice, Communication and Media Studies, Counseling and Psychotherapy, Economics, Education, Geography, Health, History, Marketing, Nursing, Political Science and International Relations, Psychology, Social Policy and Public Policy, Social Work, Sociology, Science, Technology, Computer Science, Engineering, Mathematics, Medicine

Request Permissions

Show page numbers Hide page numbers

Record linkage, or exact matching, refers to the activity of linking together two or more databases on a single population. The U.S. Bureau of the Census uses record linkage in its efforts to estimate the population undercount of the decennial census. The two files that Census links together are a sample of the decennial census and a second, independent enumeration of the population areas covered by the sample. Some individuals are counted in both the census and the second enumeration, whereas others are absent from one or both of the canvasses. Suppose that the numbers of individuals who are enumerated are given in Table 1. The question marks indicate counts of individuals that are not known.

Table 1 Counts From Two Enumerations Based on Record Linkage
			Second Enumeration
		Yes	No	Total
Census	Yes	nyy	nyn	n census
Enumeration	No	nny	?	?
	Total	n second	?	?

The total size of the population can be estimated if assumptions about the two enumeration efforts and the population are made. Under standard assumptions of capture-recapture models, the total size of the population can be estimated as ncensusnsecond/nyy. If 250 people were counted in the census sample, 200 were counted in the second enumeration, and 125 were common to both lists, then the population size would be estimated as 250(200)/125 = 400. However, if only 100 people were common to both lists, then one would estimate the population size to be 250(200)/100 = 500.

Record linkage is challenging when the sizes of the files being linked are very large and unique identifying information on every individual is not available. Examples of unique identifiers (IDs) include Social Security numbers (SSNs); U.S. passport numbers; state driver's license numbers; and, except for identical twins, a person's genetic code. The decennial census does not collect SSNs or any other unique ID number. The number of people in the census undercount sample is a few hundred thousand. Thus, record linkage in this context needs to be computerized and automated.

Entries in the two databases are compared on the fields of information common to two files. Consider the following hypothetical records in the two files called File A and File B:

File A Record	File B Record
Wayne Feller	W. A. Fuller
Male, Married, Age 70	Male, Married, Age 71
202 Snedecor Rd.	202 Snedecor, Apt. 3
Ames, Iowa	Aimes, IA

These records, although containing clear differences, could correspond to the same person. Alternate versions of names and addresses, nicknames and abbreviations, and misspellings and typographical errors are frequently encountered in large, population databases. The U.S. Bureau of the Census and other U.S. and foreign statistical agencies use sophisticated methods to address these and other challenges.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Record Linkage

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends