A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

被引:13
|
作者
Ong, Toan C. [1 ]
Duca, Lindsey M. [2 ]
Kahn, Michael G. [1 ]
Crume, Tessa L. [1 ]
机构
[1] Univ Colorado, Sch Med, Dept Pediat, Anschutz Med Campus,13611 East Colfax,Suite 210, Aurora, CO 80045 USA
[2] Univ Colorado, Colorado Sch Publ Hlth, Dept Epidemiol, Anschutz Med Campus, Aurora, CO 80045 USA
关键词
record linkage; data harmonization; patient matching; congenital heart disease; hybrid; LINKING; IMPLEMENTATION; IDENTIFIERS;
D O I
10.1093/jamia/ocz232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. Materials and Methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 50 条
  • [1] Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage
    Tromp, Miranda
    Ravelli, Anita C.
    Bonsel, Gouke J.
    Hasman, Arie
    Reitsma, Johannes B.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (05) : 565 - 572
  • [2] Detecting Duplicates at Hospital Admission: Comparison of Deterministic and Probabilistic Record Linkage
    Waldenburger, Andreas
    Nasseh, Daniel
    Stausberg, Juergen
    UNIFYING THE APPLICATIONS AND FOUNDATIONS OF BIOMEDICAL AND HEALTH INFORMATICS, 2016, 226 : 135 - 138
  • [3] When to conduct probabilistic linkage vs. deterministic linkage? A simulation study
    Zhu, Ying
    Matsuyama, Yutaka
    Ohashi, Yasuo
    Setoguchi, Soko
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 56 : 80 - 86
  • [4] Probabilistic record linkage
    Sayers, Adrian
    Ben-Shlomo, Yoav
    Blom, Ashley W.
    Steele, Fiona
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2016, 45 (03) : 954 - 964
  • [5] A practical approach for incorporating dependence among fields in probabilistic record linkage
    Joanne K Daggy
    Huiping Xu
    Siu L Hui
    Roland E Gamache
    Shaun J Grannis
    BMC Medical Informatics and Decision Making, 13
  • [6] A Probabilistic Record Linkage Model for Survival Data
    Hof, Michel H.
    Ravelli, Anita C.
    Zwinderman, Aeilko H.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (520) : 1504 - 1515
  • [7] An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries
    Asher, Jana
    Resnick, Dean
    Brite, Jennifer
    Brackbill, Robert
    Cone, James
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (18) : 1 - 16
  • [8] Record-linkage methodology for prescribing research
    Libby, G
    MacDonald, TM
    Evans, JMM
    JOURNAL OF CLINICAL PHARMACY AND THERAPEUTICS, 2001, 26 (04) : 241 - 246
  • [9] A scaling approach to record linkage
    Goldstein, Harvey
    Harron, Katie
    Cortina-Borja, Mario
    STATISTICS IN MEDICINE, 2017, 36 (16) : 2514 - 2521
  • [10] Supervised Negative Binomial Classifier for Probabilistic Record Linkage
    Kashyap, Harish
    Byadarhaly, Kiran
    INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 727 - 738