A scaling approach to record linkage

被引:14
作者
Goldstein, Harvey [1 ,3 ]
Harron, Katie [2 ]
Cortina-Borja, Mario [3 ]
机构
[1] Univ Bristol, Bristol, Avon, England
[2] London Sch Hyg & Trop Med, London, England
[3] UCL, London, England
基金
英国惠康基金;
关键词
scaling; record linkage; data linkage; correspondence analysis;
D O I
10.1002/sim.7287
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
With increasing availability of large datasets derived from administrative and other sources, there is an increasing demand for the successful linking of these to provide rich sources of data for further analysis. Variation in the quality of identifiers used to carry out linkage means that existing approaches are often based upon 'probabilistic' models, which are based on a number of assumptions, and can make heavy computational demands. In this paper, we suggest a new approach to classifying record pairs in linkage, based upon weights (scores) derived using a scaling algorithm. The proposed method does not rely on training data, is computationally fast, requires only moderate amounts of storage and has intuitive appeal. Copyright c 2017 John Wiley & Sons, Ltd.
引用
收藏
页码:2514 / 2521
页数:8
相关论文
共 14 条
[1]  
[Anonymous], 2012, DATA MATCHING CONCEP, DOI DOI 10.1007/978-3-642-31164-2
[2]   RECORD LINKAGE - STATISTICAL-MODELS FOR MATCHING COMPUTER RECORDS [J].
COPAS, JB ;
HILTON, FJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1990, 153 :287-320
[3]   A THEORY FOR RECORD LINKAGE [J].
FELLEGI, IP ;
SUNTER, AB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) :1183-&
[4]   THE CHOICE OF CONSTRAINTS IN CORRESPONDENCE-ANALYSIS [J].
GOLDSTEIN, H .
PSYCHOMETRIKA, 1987, 52 (02) :207-215
[5]   The analysis of record-linked data using multiple imputation with data value priors [J].
Goldstein, Harvey ;
Harron, Katie ;
Wade, Angie .
STATISTICS IN MEDICINE, 2012, 31 (28) :3481-3493
[6]  
Harron K., 2015, METHODOLOGICAL DEV D
[7]   Linkage, Evaluation and Analysis of National Electronic Healthcare Data: Application to Providing Enhanced Blood-Stream Infection Surveillance in Paediatric Intensive Care [J].
Harron, Katie ;
Goldstein, Harvey ;
Wade, Angie ;
Muller-Pebody, Berit ;
Parslow, Roger ;
Gilbert, Ruth .
PLOS ONE, 2013, 8 (12)
[8]  
HEALY MJR, 1976, BIOMETRIKA, V63, P219, DOI 10.2307/2335613
[9]  
Ng AY, 2002, ADV NEUR IN, V14, P841
[10]   Bayesian Estimation of Bipartite Matchings for Record Linkage [J].
Sadinle, Mauricio .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :600-612