Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore

被引:6
作者
Hardjojo, Antony [1 ]
Gunachandran, Arunan [1 ]
Pang, Long [1 ]
Bin Abdullah, Mohammed Ridzwan [1 ]
Wah, Win [1 ]
Chong, Joash Wen Chen [1 ]
Goh, Ee Hui [1 ]
Teo, Sok Huang [2 ]
Lim, Gilbert [3 ]
Lee, Mong Li [3 ]
Hsu, Wynne [3 ]
Lee, Vernon [1 ]
Chen, Mark I-Cheng [1 ,4 ]
Wong, Franco [2 ,5 ]
Phang, Jonathan Siung King [2 ,5 ]
机构
[1] Natl Univ Singapore, Natl Univ Hlth Syst, Saw Swee Hock Sch Publ Hlth, MD1,10-01 12 Sci Dr 2, Singapore 117549, Singapore
[2] Natl Healthcare Grp Polyclin, Singapore, Singapore
[3] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[4] Natl Ctr Infect Dis, Singapore, Singapore
[5] Natl Univ Polyclin, Singapore, Singapore
基金
英国医学研究理事会;
关键词
natural language processing; communicable diseases; epidemiology; surveillance; syndromic surveillance; electronic health records; SYNDROMIC SURVEILLANCE; IDENTIFYING INFLUENZA; SYSTEM; DEFINITIONS; PERFORMANCE; OUTBREAK;
D O I
10.2196/medinform.8204
中图分类号
R-058 [];
学科分类号
摘要
Background: Free-text clinical records provide a source of information that complements traditional disease surveillance. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms. Objective: The aim of this study was to develop, train, and validate Clinical History Extractor for Syndromic Surveillance (CHESS), an natural language processing algorithm to extract clinical information from free-text primary care records. Methods: CHESS is a keyword-based natural language processing algorithm to extract 48 signs and symptoms suggesting respiratory infections, gastrointestinal infections, constitutional, as well as other signs and symptoms potentially associated with infectious diseases. The algorithm also captured the assertion status (affirmed, negated, or suspected) and symptom duration. Electronic medical records from the National Healthcare Group Polyclinics, a major public sector primary care provider in Singapore, were randomly extracted and manually reviewed by 2 human reviewers, with a third reviewer as the adjudicator. The algorithm was evaluated based on 1680 notes against the human-coded result as the reference standard, with half of the data used for training and the other half for validation. Results: The symptoms most commonly present within the 1680 clinical records at the episode level were those typically present in respiratory infections such as cough (744/7703, 9.66%), sore throat (591/7703, 7.67%), rhinorrhea (552/7703, 7.17%), and fever (928/7703, 12.04%). At the episode level, CHESS had an overall performance of 96.7% precision and 97.6% recall on the training dataset and 96.0% precision and 93.1% recall on the validation dataset. Symptoms suggesting respiratory and gastrointestinal infections were all detected with more than 90% precision and recall. CHESS correctly assigned the assertion status in 97.3%, 97.9%, and 89.8% of affirmed, negated, and suspected signs and symptoms, respectively (97.6% overall accuracy). Symptom episode duration was correctly identified in 81.2% of records with known duration status. Conclusions: We have developed an natural language processing algorithm dubbed CHESS that achieves good performance in extracting signs and symptoms from primary care free-text clinical records. In addition to the presence of symptoms, our algorithm can also accurately distinguish affirmed, negated, and suspected assertion statuses and extract symptom durations.
引用
收藏
页码:45 / 59
页数:15
相关论文
共 32 条
[11]   Comparison of Natural Language Processing Biosurveillance Methods for Identifying Influenza From Encounter Notes [J].
Elkin, Peter L. ;
Froehling, David A. ;
Wahner-Roedler, Dietlind L. ;
Brown, Steven H. ;
Bailey, Kent R. .
ANNALS OF INTERNAL MEDICINE, 2012, 156 (01) :11-U57
[12]   Epidemiology and Relative Severity of Influenza Subtypes in Singapore in the Post-Pandemic Period from 2009 to 2010 [J].
Goh, Ee Hui ;
Jiang, Lili ;
Hsu, Jung Pu ;
Tan, Linda Wei Lin ;
Lim, Wei Yen ;
Phoon, Meng Chee ;
Leo, Yee Sin ;
Barr, Ian G. ;
Chow, Vincent Tak Kwong ;
Lee, Vernon J. ;
Lin, Cui ;
Lin, Raymond ;
Sadarangani, Sapna P. ;
Young, Barnaby ;
Chen, Mark I-Cheng .
CLINICAL INFECTIOUS DISEASES, 2017, 65 (11) :1905-1913
[13]   Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis [J].
Ho, Zheng Jie Marc ;
Hapuarachchi, Hapuarachchige Chanditha ;
Barkham, Timothy ;
Chow, Li Ping Angela ;
Ng, Lee Ching ;
Lee, Jian Ming Vernon ;
Leo, Yee Sin ;
Prem, Kiesha ;
Lim, Yue Hui Georgina ;
de Sessions, Paola F. ;
Rabaa, Maia A. ;
Chong, Chee Seng ;
Tan, Cheong Huat ;
Rajarethinam, Jayanthi ;
Tan, Junhao ;
Anderson, Danielle E. ;
Ong, Xinmei ;
Cook, Alex R. ;
Chong, Chia Yin ;
Hsu, Li Yang ;
Yap, Grace ;
Lai, Yee Ling ;
Chawla, Tanu ;
Pan, Louise ;
Sim, Shuzhen ;
Chen, I-Cheng Mark ;
Thoon, Koh Cheng ;
Yung, Chee Fu ;
Li, Jia Hui ;
Ng, Hee Ling Deborah ;
Nandar, Khine ;
Ooi, Peng Lim ;
Lin, Raymond Tzer Pin ;
Aw, Pauline ;
Uehara, Anna ;
De, Partha Pratim ;
Soon, Wendy ;
Hibberd, Martin Lloyd ;
Ng, Huck Hui ;
Maurer-Stroh, Sebastian ;
Sessions, October M. .
LANCET INFECTIOUS DISEASES, 2017, 17 (08) :813-821
[14]   Syndromic surveillance: is it a useful tool for local outbreak detection? [J].
Hope, K ;
Durrheim, DN ;
d'Espaignet, ET ;
Dalton, C .
JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2006, 60 (05) :374-375
[15]   Performance of case definitions for influenza surveillance [J].
Jiang, L. ;
Lee, V. J. ;
Lim, W. Y. ;
Chen, M. I. ;
Chen, Y. ;
Tan, L. ;
Lin, R. T. ;
Leo, Y. S. ;
Barr, I. ;
Cook, A. R. .
EUROSURVEILLANCE, 2015, 20 (22)
[16]   An Introduction to Natural Language Processing How You Can Get More From Those Electronic Notes You Are Generating [J].
Kimia, Amir A. ;
Savova, Guergana ;
Landschaft, Assaf ;
Harper, Marvin B. .
PEDIATRIC EMERGENCY CARE, 2015, 31 (07) :536-541
[17]   Comparability of Different Methods for Estimating Influenza Infection Rates Over a Single Epidemic Wave [J].
Lee, Vernon J. ;
Chen, Mark I. ;
Yap, Jonathan ;
Ong, Jocelyn ;
Lim, Wei-Yen ;
Lin, Raymond T. P. ;
Barr, Ian ;
Ong, Jimmy B. S. ;
Mak, Tze Minn ;
Goh, Lee Gan ;
Leo, Yee Sin ;
Kelly, Paul M. ;
Cook, Alex R. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2011, 174 (04) :468-478
[18]  
Levin James E, 2005, AMIA Annu Symp Proc, P445
[19]  
Lombardo J, 2003, J URBAN HEALTH, V80, pI32
[20]  
Lombardo Joseph S, 2004, MMWR Suppl, V53, P159