Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records

被引:102
作者
Enamorado, Ted [1 ]
Fifield, Benjamin [1 ]
Imai, Kosuke [2 ,3 ]
机构
[1] Princeton Univ, Dept Polit, Princeton, NJ 08544 USA
[2] Harvard Univ, Dept Govt, Cambridge, MA 02138 USA
[3] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
REGRESSION-ANALYSIS; FILE LINKING; LINKAGE; HEALTH;
D O I
10.1017/S0003055418000783
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Since most social science research relies on multiple data sources, merging data sets is an essential part of researchers' workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical model of probabilistic record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.
引用
收藏
页码:353 / 371
页数:19
相关论文
共 63 条
[1]   RADIO AND THE RISE OF THE NAZIS IN PREWAR GERMANY [J].
Adena, Maja ;
Enikolopov, Ruben ;
Petrova, Maria ;
Santarosa, Veronica ;
Zhuravskaya, Ekaterina .
QUARTERLY JOURNAL OF ECONOMICS, 2015, 130 (04) :1885-1939
[2]  
[Anonymous], 2015, HACKING ELECTORATE C
[3]  
[Anonymous], 2017, INT J RES HUMANITIES, DOI DOI 10.13114/MJH.2015111386
[4]  
[Anonymous], 2017, ESODO, V39, P3, DOI DOI 10.1007/s11109-016-9343-y
[5]  
[Anonymous], 2012, DATA MATCHING CONCEP, DOI DOI 10.1007/978-3-642-31164-2
[6]  
[Anonymous], 1993, CAH SEXOL CLIN, V19, P39
[7]  
[Anonymous], 2005, CENSUS, pA15
[8]   Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate [J].
Ansolabehere, Stephen ;
Hersh, Eitan .
POLITICAL ANALYSIS, 2012, 20 (04) :437-459
[9]  
BELIN TR, 1995, J AM STAT ASSOC, V90, P694
[10]   MEASURING VOTER REGISTRATION AND TURNOUT IN SURVEYS DO OFFICIAL GOVERNMENT RECORDS YIELD MORE ACCURATE ASSESSMENTS? [J].
Berent, Matthew K. ;
Krosnick, Jon A. ;
Lupia, Arthur .
PUBLIC OPINION QUARTERLY, 2016, 80 (03) :597-621