Mining Text for Disease Diagnosis

被引:4
作者
Tsumoto, Shusaku [1 ]
Kimura, Tomohiro [2 ]
Iwata, Haruko [2 ]
Hirano, Shoji [1 ]
机构
[1] Shimane Univ, Fac Med, Dept Med Informat, 89-1 Enya Cho, Izumo, Shimane 6938501, Japan
[2] Shimane Univ, Fac Med, Med Serv Div, 89-1 Enya Cho, Izumo, Shimane 6938501, Japan
来源
5TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2017 | 2017年 / 122卷
基金
日本学术振兴会;
关键词
Discharge summary; text mining; classification; deep learning; random forest; decision tree; SVM; correspondence analysis;
D O I
10.1016/j.procs.2017.11.483
中图分类号
F [经济];
学科分类号
02 ;
摘要
Electronic patient records (EPR) are rich in texts, where almost all the decision making processes of medical staff are written. Thus, mining in EPR is important for acquision of decision making process and diagnosis. In this paper, as a first step, we focus on text mining for discharge summaries, which include the compact explanation for the patient's admission. a record of her complaints, physical findings, laboratory results and radiographic studies while hospitalized; a list of changes in her medications at discharge; and recommendations for follow up care. Text mining process consists of the following four processes: first, morphological analysis is applied to a set of summaries and a term matrix is generated. Second, correspond analysis is applied to the classification labels and the term matrix and generates two dimensional coordinates. By measuring the distances between categories and the assigned points, ranking of key words will be generated. Then, keywords are selected as attributes according to the rank, and training examples for classifiers will be generated. Finally, learning methods are applied to the training examples. Experimental validation shows that random forest achieved the best performance and the second best was the deep learner with a small difference, but decision tree methods with many keywords performed only a little worse than neural network or deep learning methods. (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:1133 / 1140
页数:8
相关论文
共 16 条
[1]  
[Anonymous], 2016, Trans. Mach. Learn. Data Min.
[2]  
[Anonymous], 2022, INTRO RECURSIVE PART
[3]  
Aramaki E., 2009, 15 ANN C ASS NAT LAN, P348
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Drees M, 2013, THESIS
[6]  
Karatzoglou A., 2004, J. Stat. Softw., V11, P1, DOI [DOI 10.18637/JSS.V011.I09, 10.18637/jss.v011.i09]
[7]   Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap [J].
Kim, Ji-Hyun .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (11) :3735-3745
[9]  
MeSH, 2017, MED SUBJ HEAD
[10]  
Miura Y, 2010, 16 ANN C ASS NAT LAN, P78