Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis

被引:20
|
作者
de Oliveira, Gisele Pinto [1 ]
de Souza Bierrenbach, Ana Luiza [2 ]
de Camargo Junior, Kenneth Rochel [3 ]
Coeli, Claudia Medina [4 ]
Pinheiro, Rejane Sobrino [4 ]
机构
[1] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Programa Posgrad Saude Colet, Rio De Janeiro, RJ, Brazil
[2] Hosp Sirio Libanes, Inst Ensino & Pesquisa, Sao Paulo, SP, Brazil
[3] Univ Estado Rio de Janeiro, Inst Med Social, Rio De Janeiro, RJ, Brazil
[4] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Rio De Janeiro, RJ, Brazil
来源
REVISTA DE SAUDE PUBLICA | 2016年 / 50卷
关键词
Tuberculosis; epidemiology; Data Accuracy; Sensitivity and Specificity; Epidemiological Surveillance; statistics & numerical data;
D O I
10.1590/S1518-8787.2016050006327
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
OBJECTIVE: To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS: The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System - Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated. RESULTS: Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used. CONCLUSIONS: The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Diagnostic Accuracy of the INSHI Consensus Case Definition for the Diagnosis of Paradoxical Tuberculosis-IRIS
    Stek, Cari
    Buyze, Jozefien
    Menten, Joris
    Schutz, Charlotte
    Thienemann, Friedrich
    Blumenthal, Lisette
    Maartens, Gary
    Boyles, Tom
    Wilkinson, Robert J.
    Meintjes, Graeme
    Lynen, Lutgarde
    JAIDS-JOURNAL OF ACQUIRED IMMUNE DEFICIENCY SYNDROMES, 2021, 86 (05) : 587 - 592
  • [22] Record Linkage in Studies of Cerebrovascular Disease in Oxford, England
    Acheson, Roy M.
    Fairbairn, Anthony S.
    STROKE, 1971, 2 (01) : 48 - 57
  • [23] Tuberculosis and AIDS Co-Morbidity in Brazil: Linkage of the Tuberculosis and AIDS Databases
    Miranda, Angelica Espinosa
    Golub, Jonathan E.
    Lucena, Francisca de Fatima
    Maciel, Ethel Noia
    Gurgel, Maria de Fatima
    Dietze, Reynaldo
    BRAZILIAN JOURNAL OF INFECTIOUS DISEASES, 2009, 13 (02) : 137 - 141
  • [24] Unilateral testicular tuberculosis: case report
    Hadadi, A.
    Pourmand, G.
    Mehdipour-Aghabagher, B.
    ANDROLOGIA, 2012, 44 (01) : 70 - 72
  • [25] Assessment of a method for automatic match classification in probabilistic data linkage
    Pereira Duarte, Daniela de Almeida
    Lima Correa, Camila Soares
    Fayer, Vivian Assis
    Nogueira, Mario Cirio
    Bustamanie-Teixeira, Maria Teresa
    CADERNOS DE SAUDE PUBLICA, 2019, 35 (11):
  • [26] Lessons in Linkage: combining administrative data using deterministic linkage for surveillance of sports and recreation injuries in Florida, United States
    Baker, Charlotte
    Nottingham, Quinton
    Holloway, Jonathan
    INTERNATIONAL JOURNAL OF POPULATION DATA SCIENCE (IJPDS), 2022, 7 (01):
  • [27] Record-linkage studies of the coexistence of epilepsy and bipolar disorder
    Wotton, Clare J.
    Goldacre, Michael J.
    SOCIAL PSYCHIATRY AND PSYCHIATRIC EPIDEMIOLOGY, 2014, 49 (09) : 1483 - 1488
  • [28] Record-linkage procedures in epidemiology: an Italian multicentre study
    Fornari, Carla
    Madotto, Fabiana
    Demaria, Moreno
    Romanelli, Anna
    Pepe, Pasquale
    Raciti, Mauro
    Tancioni, Valeria
    Chini, Francesco
    Trerotoli, Paolo
    Bartolomeo, Nicola
    Serio, Gabriella
    Cesana, Giancarlo
    Corrao, Giovanni
    EPIDEMIOLOGIA & PREVENZIONE, 2008, 32 (03): : 79 - 88
  • [29] Record Linkage for Malaria Deaths Data Recovery and Surveillance in Brazil
    Sabino Garcia, Klauss Kleydmann
    Xavier, Danielly Batista
    Soremekun, Seyi
    Abrahao, Amanda Amaral
    Drakeley, Chris
    Ramalho, Walter Massa
    Siqueira, Andre M.
    TROPICAL MEDICINE AND INFECTIOUS DISEASE, 2023, 8 (12)
  • [30] Pregnancy outcomes in women with endometriosis: a national record linkage study
    Saraswat, L.
    Ayansina, D. T.
    Cooper, K. G.
    Bhattacharya, S.
    Miligkos, D.
    Horne, A. W.
    Bhattacharya, S.
    BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2017, 124 (03) : 444 - 452