Automated Linking of Historical Data

被引:88
作者
Abramitzky, Ran [1 ,2 ]
Boustan, Leah [2 ,3 ]
Eriksson, Katherine [2 ,4 ]
Feigenbaum, James [2 ,5 ]
Perez, Santiago [2 ,4 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] NBER, Cambridge, MA 02138 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
[4] Univ Calif Davis, Davis, CA USA
[5] Boston Univ, Boston, MA 02215 USA
关键词
INTERGENERATIONAL OCCUPATIONAL-MOBILITY; ECONOMIC OUTCOMES; GREAT MIGRATION; UNITED-STATES; SELF-SELECTION; AGE; CENSUS; IMMIGRANTS; SAMPLE; POOR;
D O I
10.1257/jel.20201599
中图分类号
F [经济];
学科分类号
02 ;
摘要
The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5 percent) false positive rates. The automated methods trace out a frontier illustrating the trade-off between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.
引用
收藏
页码:865 / 918
页数:54
相关论文
共 64 条
[31]   A new sample of males linked from the Public-Use-Microdata-Sample of the 1850 US federal census of population to the 1860 US federal census manuscript schedules [J].
Ferrie, JP .
HISTORICAL METHODS, 1996, 29 (04) :141-156
[32]  
Fogel RobertW., 2000, Aging of Veterans of the Union Army: Military, Pension, and Medical Records, 1820-1940
[33]   Backlash: The Unintended Effects of Language Prohibition in US Schools after World War I [J].
Fouka, Vasiliki .
REVIEW OF ECONOMIC STUDIES, 2020, 87 (01) :204-239
[34]   New Methods of Census Record Linking [J].
Goeken, Ron ;
Huynh, Lap ;
Lynch, T. A. ;
Vick, Rebecca .
HISTORICAL METHODS, 2011, 44 (01) :7-14
[35]  
Goldin C, 2000, J ECON HIST, V60, P782
[36]  
Goldin C, 2007, BROOKINGS PAP ECO AC, P135
[37]  
Goldin Claudia, 2010, [The 1915 Iowa State Census Project, Inter-university Consortium for Political and Social Research [distributor]], DOI 10.3886/ICPSR28501.v1
[38]  
Gould J.D., 1980, J EUROPEAN EC HIST, V9, P267
[39]   When the Levee Breaks: Black Migration and Economic Development in the American South [J].
Hornbeck, Richard ;
Naidu, Suresh .
AMERICAN ECONOMIC REVIEW, 2014, 104 (03) :963-990
[40]   PSYCHOLOGY OF PREDICTION [J].
KAHNEMAN, D ;
TVERSKY, A .
PSYCHOLOGICAL REVIEW, 1973, 80 (04) :237-251