A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

被引:13
|
作者
Ong, Toan C. [1 ]
Duca, Lindsey M. [2 ]
Kahn, Michael G. [1 ]
Crume, Tessa L. [1 ]
机构
[1] Univ Colorado, Sch Med, Dept Pediat, Anschutz Med Campus,13611 East Colfax,Suite 210, Aurora, CO 80045 USA
[2] Univ Colorado, Colorado Sch Publ Hlth, Dept Epidemiol, Anschutz Med Campus, Aurora, CO 80045 USA
关键词
record linkage; data harmonization; patient matching; congenital heart disease; hybrid; LINKING; IMPLEMENTATION; IDENTIFIERS;
D O I
10.1093/jamia/ocz232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. Materials and Methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 50 条
  • [41] Improved scalability in mining using ontology record linkage algorithm
    Prabhu, T.
    Dhas, C. Suresh Gnana
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 511 - 519
  • [42] Supervised learning using a symmetric bilinear form for record linkage
    Abril, Daniel
    Torra, Vicenc
    Navarro-Arribas, Guillermo
    INFORMATION FUSION, 2015, 26 : 144 - 153
  • [43] An automated methodology for the classification of focal and nonfocal EEG signals using a hybrid classification approach
    Bee, Mohamed Kasim Mariam
    Vidhya, Krishnan
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2020, 30 (01) : 147 - 153
  • [44] Hybrid Private Record Linkage: Separating Differentially Private Synopses from Matching Records
    Rao, Fang-Yu
    Cao, Jianneng
    Bertino, Elisa
    Kantarcioglu, Murat
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2019, 22 (03)
  • [45] A supervised record linkage approach for anomaly detection in insurance assets granular data
    La Serra V.
    Svezia E.
    Quality & Quantity, 2024, 58 (5) : 4181 - 4205
  • [46] Deterministic Record Linkage with Indirect Identifiers: Data of the Berlin Myocardial Infarction Registry and the AOK Nordost for Patients with Myocardial Infarction
    Maier, B.
    Wagner, K.
    Behrens, S.
    Bruch, L.
    Busse, R.
    Schmidt, D.
    Schuehlen, H.
    Thieme, R.
    Theres, H.
    GESUNDHEITSWESEN, 2015, 77 (02) : E15 - E19
  • [47] Record Linkage for Event Identification in XML Feeds Stream Using ELM
    Bi, Xin
    Zhao, Xiangguo
    Ma, Wenhui
    Zhang, Zhen
    Zhan, Heng
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 463 - 476
  • [48] Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters
    De Vries, Timothy
    Ke, Hui
    Chawla, Sanjay
    Christen, Peter
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2011, 5 (02)
  • [49] Record linkage using fuzzy sets for detecting suspicious financial transactions
    Onar, Sezi Cevik
    Oztaysi, Basar
    Kahraman, Cengiz
    PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 241 - 246
  • [50] Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data
    Li, Xiao-Bai
    Sarkar, Sumit
    INFORMATION SYSTEMS RESEARCH, 2011, 22 (04) : 774 - 789