Automatic generation of case-detection algorithms to identify children with asthma from large electronic health record databases

被引:31
作者
Afzal, Zubair [1 ]
Engelkes, Marjolein [1 ]
Verhamme, Katia M. C. [1 ]
Janssens, Hettie M. [2 ]
Sturkenboom, Miriam C. J. M. [1 ]
Kors, Jan A. [1 ]
Schuemie, Martijn J. [1 ]
机构
[1] Erasmus Univ, Med Ctr, Dept Med Informat, NL-3000 CA Rotterdam, Netherlands
[2] Erasmus Univ, Med Ctr, Sophia Childrens Hosp, Dept Pediat, NL-3000 CA Rotterdam, Netherlands
关键词
case-detection algorithms; machine learning; electronic medical records; automated case definition; pharmacoepidemiology; CLASSIFICATION; MONTELUKAST; MANAGEMENT;
D O I
10.1002/pds.3438
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. Methods The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5-18 years using a broad query on asthma-related codes, drugs, and free text. A training set of 5032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. Results The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. Conclusions The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database. Copyright (C) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:826 / 833
页数:8
相关论文
共 26 条
[1]   Ensemble classification of paired data [J].
Adler, Werner ;
Brenning, Alexander ;
Potapov, Sergej ;
Schmid, Matthias ;
Lausen, Berthold .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (05) :1933-1941
[2]  
Allen David B, 2006, Adv Pediatr, V53, P101, DOI 10.1016/j.yapd.2006.04.006
[3]   Comparison of Electronic Laboratory Reports, Administrative Claims, and Electronic Health Record Data for Acute Viral Hepatitis Surveillance [J].
Allen-Dicker, Joshua ;
Klompas, Michael .
JOURNAL OF PUBLIC HEALTH MANAGEMENT AND PRACTICE, 2012, 18 (03) :209-214
[4]   Global strategy for asthma management and prevention: GINA executive summary [J].
Bateman, E. D. ;
Hurd, S. S. ;
Barnes, P. J. ;
Bousquet, J. ;
Drazen, J. M. ;
FitzGerald, M. ;
Gibson, P. ;
Ohta, K. ;
O'Byrne, P. ;
Pedersen, S. E. ;
Pizzichini, E. ;
Sullivan, S. D. ;
Wenzel, S. E. ;
Zar, H. J. .
EUROPEAN RESPIRATORY JOURNAL, 2008, 31 (01) :143-178
[5]   Electronic health records and adverse drug events after patient transfer [J].
Boockvar, K. S. ;
Livote, E. E. ;
Goldstein, N. ;
Nebeker, J. R. ;
Siu, A. ;
Fried, T. .
QUALITY & SAFETY IN HEALTH CARE, 2010, 19 (05)
[6]   Regular treatment with formoterol for chronic asthma: serious adverse events [J].
Cates, Christopher J. ;
Cates, Matthew J. ;
Lasserson, Toby J. .
COCHRANE DATABASE OF SYSTEMATIC REVIEWS, 2008, (04)
[7]  
Cates CJ, 2012, COCHRANE DB SYST REV, V4
[8]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[9]  
Chawla NV, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P875, DOI 10.1007/978-0-387-09823-4_45
[10]  
Cohen W. W., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P115