A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

被引:13
|
作者
Ong, Toan C. [1 ]
Duca, Lindsey M. [2 ]
Kahn, Michael G. [1 ]
Crume, Tessa L. [1 ]
机构
[1] Univ Colorado, Sch Med, Dept Pediat, Anschutz Med Campus,13611 East Colfax,Suite 210, Aurora, CO 80045 USA
[2] Univ Colorado, Colorado Sch Publ Hlth, Dept Epidemiol, Anschutz Med Campus, Aurora, CO 80045 USA
关键词
record linkage; data harmonization; patient matching; congenital heart disease; hybrid; LINKING; IMPLEMENTATION; IDENTIFIERS;
D O I
10.1093/jamia/ocz232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. Materials and Methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 50 条
  • [21] Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party
    Lazrig, Ibrahim
    Ong, Toan C.
    Ray, Indrajit
    Ray, Indrakshi
    Jiang, Xiaoqian
    Vaidya, Jaideep
    2018 16TH ANNUAL CONFERENCE ON PRIVACY, SECURITY AND TRUST (PST), 2018, : 75 - 84
  • [22] A novel approach to improve the Record Linkage process
    Benkhaled, Hamid Naceur
    Berrabah, Djamel
    Boufares, Faouzi
    2019 6TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT 2019), 2019, : 1504 - 1509
  • [23] Efficient and Practical Approach for Private Record Linkage
    Yakout, Mohamed
    Atallah, Mikhail J.
    Elmagarmid, Ahmed
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2012, 3 (03): : 1 - 28
  • [24] A practical approach for scalable record linkage on Hadoop
    Yang, Fengyu
    Chen, Ying
    Zhang, Ye
    MATERIALS PROCESSING AND MANUFACTURING III, PTS 1-4, 2013, 753-755 : 3018 - 3024
  • [25] Linkage of multiple electronic health record datasets using a 'spine linkage' approach compared with all 'pairwise linkages'
    Blake, Helen A.
    Sharples, Linda D.
    Harron, Katie
    van der Meulen, Jan H.
    Walker, Kate
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2023, 52 (01) : 214 - 226
  • [26] Using probabilistic record linkage and propensity-score matching to identify a community-based comparison population
    L Holland, Margaret
    Taylor, Rose M.
    Condon, Eileen
    Rinne, Gabrielle R.
    Bleicher, Sarah
    Seldin, Margaret L.
    Sadler, Lois S.
    Li, Connie
    RESEARCH IN NURSING & HEALTH, 2022, 45 (03) : 390 - 400
  • [27] Parallel corpus approach for name matching in record linkage
    Sukharev, Jeffrey
    Zhukov, Leonid
    Popescul, Alexandrin
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 995 - 1000
  • [29] The Challenge of Pairing Big Datasets: Probabilistic Record Linkage Methods and Diagnosis of Their Empirical Viability
    Peng Y.
    Mation L.F.
    Journal of Business Cycle Research, 2020, 16 (1) : 35 - 57
  • [30] A HIERARCHICAL BAYESIAN APPROACH TO RECORD LINKAGE AND POPULATION SIZE PROBLEMS
    Tancredi, Andrea
    Liseo, Brunero
    ANNALS OF APPLIED STATISTICS, 2011, 5 (2B): : 1553 - 1585