Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies

被引:57
作者
Aldridge, Robert W. [1 ,2 ]
Shaji, Kunju [2 ]
Hayward, Andrew C. [1 ]
Abubakar, Ibrahim [2 ,3 ,4 ]
机构
[1] UCL, Inst Hlth Informat, London, England
[2] Publ Hlth England, Ctr Infect Dis Surveillance & Control, London, England
[3] UCL, Dept Infect & Populat Hlth, London, England
[4] UCL, MRC, Clin Trials Unit, London, England
基金
美国国家卫生研究院; 英国惠康基金;
关键词
RECORD-LINKAGE; TUBERCULOSIS; CARE;
D O I
10.1371/journal.pone.0136179
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background The Enhanced Matching System (EMS) is a probabilistic record linkage program developed by the tuberculosis section at Public Health England to match data for individuals across two datasets. This paper outlines how EMS works and investigates its accuracy for linkage across public health datasets. Methods EMS is a configurable Microsoft SQL Server database program. To examine the accuracy of EMS, two public health databases were matched using National Health Service (NHS) numbers as a gold standard unique identifier. Probabilistic linkage was then performed on the same two datasets without inclusion of NHS number. Sensitivity analyses were carried out to examine the effect of varying matching process parameters. Results Exact matching using NHS number between two datasets (containing 5931 and 1759 records) identified 1071 matched pairs. EMS probabilistic linkage identified 1068 record pairs. The sensitivity of probabilistic linkage was calculated as 99.5% (95% CI: 98.9, 99.8), specificity 100.0% (95% CI: 99.9, 100.0), positive predictive value 99.8% (95% CI: 99.3, 100.0), and negative predictive value 99.9% (95% CI: 99.8, 100.0). Probabilistic matching was most accurate when including address variables and using the automatically generated threshold for determining links with manual review. Conclusion With the establishment of national electronic datasets across health and social care, EMS enables previously unanswerable research questions to be tackled with confidence in the accuracy of the linkage process. In scenarios where a small sample is being matched into a very large database (such as national records of hospital attendance) then, compared to results presented in this analysis, the positive predictive value or sensitivity may drop according to the prevalence of matches between databases. Despite this possible limitation, probabilistic linkage has great potential to be used where exact matching using a common identifier is not possible, including in low-income settings, and for vulnerable groups such as homeless populations, where the absence of unique identifiers and lower data quality has historically hindered the ability to identify individuals across datasets.
引用
收藏
页数:15
相关论文
共 20 条
[1]   Probabilistic record linkage and a method to calculate the positive predictive value [J].
Blakely, T ;
Salmond, C .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2002, 31 (06) :1246-1252
[2]   Health Services Research and Data Linkages: Issues, Methods, and Directions for the Future [J].
Bradley, Cathy J. ;
Penberthy, Lynne ;
Devers, Kelly J. ;
Holden, Debra J. .
HEALTH SERVICES RESEARCH, 2010, 45 (05) :1468-1488
[3]   UK push to open up patients' data [J].
Callaway, Ewen .
NATURE, 2013, 502 (7471) :283-283
[4]   BIAS DUE TO MISCLASSIFICATION IN ESTIMATION OF RELATIVE RISK [J].
COPELAND, KT ;
CHECKOWAY, H ;
MCMICHAEL, AJ ;
HOLBROOK, RH .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1977, 105 (05) :488-495
[5]  
Coutinho Evandro Silva Freire, 2006, Cad. Saúde Pública, V22, P2249, DOI 10.1590/S0102-311X2006001000031
[6]   Accuracy of probabilistic record linkage applied to health databases: systematic review [J].
da Silveira, Daniele Pinto ;
Artmann, Elizabeth .
REVISTA DE SAUDE PUBLICA, 2009, 43 (05) :875-882
[7]   A THEORY FOR RECORD LINKAGE [J].
FELLEGI, IP ;
SUNTER, AB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) :1183-&
[8]   Changes in safety on England's roads: analysis of hospital statistics [J].
Gill, Mike ;
Goldacre, Michael J. ;
Yeates, David G. R. .
BRITISH MEDICAL JOURNAL, 2006, 333 (7558) :73-75
[9]   Trends in mortality rates comparing underlying-cause and multiple-cause coding in an English population 1979-1998 [J].
Goldacre, MJ ;
Duncan, ME ;
Cook-Mozaffari, P ;
Griffith, M .
JOURNAL OF PUBLIC HEALTH MEDICINE, 2003, 25 (03) :249-253
[10]  
Grannis Shaun J, 2003, AMIA Annu Symp Proc, P259