Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis

被引:20
|
作者
de Oliveira, Gisele Pinto [1 ]
de Souza Bierrenbach, Ana Luiza [2 ]
de Camargo Junior, Kenneth Rochel [3 ]
Coeli, Claudia Medina [4 ]
Pinheiro, Rejane Sobrino [4 ]
机构
[1] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Programa Posgrad Saude Colet, Rio De Janeiro, RJ, Brazil
[2] Hosp Sirio Libanes, Inst Ensino & Pesquisa, Sao Paulo, SP, Brazil
[3] Univ Estado Rio de Janeiro, Inst Med Social, Rio De Janeiro, RJ, Brazil
[4] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Rio De Janeiro, RJ, Brazil
来源
REVISTA DE SAUDE PUBLICA | 2016年 / 50卷
关键词
Tuberculosis; epidemiology; Data Accuracy; Sensitivity and Specificity; Epidemiological Surveillance; statistics & numerical data;
D O I
10.1590/S1518-8787.2016050006327
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
OBJECTIVE: To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS: The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System - Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated. RESULTS: Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used. CONCLUSIONS: The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries
    Asher, Jana
    Resnick, Dean
    Brite, Jennifer
    Brackbill, Robert
    Cone, James
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (18) : 1 - 16
  • [2] Accuracy of a probabilistic record-linkage methodology used to track blood donors in the Mortality Information System database
    Capuani, Ligia
    Bierrenbach, Ana Luiza
    Abreu, Fatima
    Takecian, Pedro Losco
    Ferreira, Joao Eduardo
    Sabino, Ester Cerdeira
    CADERNOS DE SAUDE PUBLICA, 2014, 30 (08): : 1623 - 1632
  • [3] Underreporting of tuberculosis in the Information System on Notifiable Diseases (SINAN): primary default and case detection from additional data sources using probabilistic record linkage
    Pinheiro, Rejane Sobrino
    Andrade, Vanusa de Lemos
    de Oliveira, Gisele Pinto
    CADERNOS DE SAUDE PUBLICA, 2012, 28 (08): : 1559 - 1568
  • [4] Improved quality of tuberculosis data using record linkage
    Bartholomay, Patricia
    de Oliveira, Gisele Pinto
    Pinheiro, Rejane Sobrino
    Nogales Vasconcelos, Ana Maria
    CADERNOS DE SAUDE PUBLICA, 2014, 30 (11): : 2459 - 2469
  • [5] Probabilistic record linkage and a method to calculate the positive predictive value
    Blakely, T
    Salmond, C
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2002, 31 (06) : 1246 - 1252
  • [6] Evaluation of the accuracy of tuberculosis surveillance system by comparison with medical record in Korea
    Park, Yoon-Sung
    Kim, HeeJin
    Kang, Hae-Young
    Cho, Seunghee
    Cho, Enhi
    Jee, Hee-Jung
    An, Hyonggin
    EUROPEAN RESPIRATORY JOURNAL, 2013, 42
  • [7] Accuracy, potential, and limitations of probabilistic record linkage in identifying deaths by gender identity and sexual orientation in the state of Rio De Janeiro, Brazil
    Rafael, Ricardo de Mattos Russo
    da Silva, Kleison Pereira
    Santos, Helena Goncalves de Souza
    Depret, Davi Gomes
    Caravaca-Morera, Jaime Alonso
    Breda, Karen Marie Lucas
    BMC PUBLIC HEALTH, 2024, 24 (01)
  • [8] Sensitivity of probabilistic record linkage for reported birth identification: Pro-Saude Study
    da Matta Coutinho, Renata Gutierrez
    Coeli, Claudia Medina
    Faerstein, Eduardo
    Chor, Dora
    REVISTA DE SAUDE PUBLICA, 2008, 42 (06):
  • [9] Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies
    Aldridge, Robert W.
    Shaji, Kunju
    Hayward, Andrew C.
    Abubakar, Ibrahim
    PLOS ONE, 2015, 10 (08):
  • [10] Anonymous non-response analysis in the ABCD cohort study enabled by probabilistic record linkage
    Tromp, M.
    van Eijsden, M.
    Ravelli, A. C. J.
    Bonsel, G. J.
    PAEDIATRIC AND PERINATAL EPIDEMIOLOGY, 2009, 23 (03) : 264 - 272