Using natural language processing to extract clinically useful information from Chinese electronic medical records

被引:37
作者
Chen, Liang [1 ]
Song, Liting [2 ]
Shao, Yue [1 ]
Li, Dewei [1 ]
Ding, Keyue [3 ]
机构
[1] Chongqing Med Univ, Dept Hepatobiliary Surg, Affiliated Hosp 1, Chongqing, Peoples R China
[2] Chongqing Med Univ, Key Lab Mol Biol Infect Dis, Minist Educ, Affiliated Hosp 2,Inst Viral Hepatitis,Dept Infec, Chongqing, Peoples R China
[3] Henan Univ, Med Genet Inst Henan Prov, Henan Prov Peoples Hosp, Henan Key Lab Genet Dis & Funct Genom,Henan Prov, Zhengzhou, Henan, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese EMRs; Cancer of liver Italian p (CLIP); Regular expression; Rule-based method; Hybrid method; HEPATOCELLULAR-CARCINOMA; PROGNOSIS;
D O I
10.1016/j.ijmedinf.2019.01.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Aims: To develop a natural language processing (NLP)-based algorithm for extracting clinically useful information for patients with hepatocellular carcinoma (HCC) from Chinese electronic medical records (EMRs) and use these data for the assessment of HCC staging. Materials and Methods: Clinical documents, including operation notes, radiology and pathology reports, of 92 HCC patients were collected from Chinese EMRs. We randomly grouped these patients into training (n = 60) and testing (n = 32) datasets. Rule-based and hybrid methods for extracting information were developed using the training set of manually-annotated operation notes. The method with better performance was used to process other documents. The performance of the algorithm was assessed via calculating the precision, recall and F-score for exact-boundary and partial-boundary matching strategies. The utility of clinically useful information for the HCC staging was assessed in comparison with that manually reviewed. Results: For operation notes, the rule-based and hybrid methods had a precision, recall and F-score 80% when the exact-boundary and partial-boundary matching strategies were applied to the testing dataset. By using the rule-based method (which has better performance than the hybrid method), three other types of documents also obtained good performance. When the extracted clinically useful information was applied for the HCC staging, the concordance rate with the manual review was 75%. Conclusion: A NLP system was developed for clinical information extraction and HCC staging based on EMRs, and the results indicate that Chinese NLP has potential utility in clinical research.
引用
收藏
页码:6 / 12
页数:7
相关论文
共 33 条
[1]  
[Anonymous], 2013, P 51 ANN M ASS COMP
[2]   Machine Learning Applications to Resting-State Functional MR Imaging Analysis [J].
Billings, John M. ;
Eder, Maxwell ;
Flood, William C. ;
Dhami, Devendra Singh ;
Natarajan, Sriraam ;
Whitlow, Christopher T. .
NEUROIMAGING CLINICS OF NORTH AMERICA, 2017, 27 (04) :609-+
[3]   The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM [J].
Edge, Stephen B. ;
Compton, Carolyn C. .
ANNALS OF SURGICAL ONCOLOGY, 2010, 17 (06) :1471-1474
[4]  
Gladis D., 2015, INDIABREAST CANC STA, P1552, DOI [10.1109/ICACCI.2015.7275834, DOI 10.1109/ICACCI.2015.7275834]
[5]   The emergence of national electronic health record architectures in the United States and Australia: Models, costs, and questions [J].
Gunter, TD ;
Terry, NP .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2005, 7 (01)
[6]  
Guo X. Y., 2017, COMPUT SCI, V42, P14, DOI [10.11896/j.issn.1002-137X.2015.2.003, DOI 10.11896/J.ISSN.1002-137X.2015.2.003]
[7]   Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports [J].
Hripcsak, G ;
Austin, JHM ;
Alderson, PO ;
Friedman, C .
RADIOLOGY, 2002, 224 (01) :157-163
[8]  
Imaichi O, 2013, P 1 WORKSH NAT LANG, P38
[9]   Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium [J].
Kho, Abel N. ;
Pacheco, Jennifer A. ;
Peissig, Peggy L. ;
Rasmussen, Luke ;
Newton, Katherine M. ;
Weston, Noah ;
Crane, Paul K. ;
Pathak, Jyotishman ;
Chute, Christopher G. ;
Bielinski, Suzette J. ;
Kullo, Iftikhar J. ;
Li, Rongling ;
Manolio, Teri A. ;
Chisholm, Rex L. ;
Denny, Joshua C. .
SCIENCE TRANSLATIONAL MEDICINE, 2011, 3 (79)
[10]   A comprehensive study of named entity recognition in Chinese clinical text [J].
Lei, Jianbo ;
Tang, Buzhou ;
Lu, Xueqin ;
Gao, Kaihua ;
Jiang, Min ;
Xu, Hua .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) :808-814