A study of damp-heat syndrome classification Using Word2vec and TF-IDF

被引:0
作者
Zhu, Wei [1 ]
Zhang, Wei [1 ]
Li, Guo-Zheng [1 ]
He, Chong [1 ]
Zhang, Lei [2 ]
机构
[1] Tongji Univ, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Chinese Med Sci, China Acad, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
来源
2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2016年
关键词
Clinical record analysis; Word2vec; TF-IDF; TCM; Damp-heat syndrome Classification;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With people's increasing concern about health, judging people's health through medical record is becoming a potential demand. Most of preview disease analysis researches were conducted on structured dataset, which usually ignored the relationship between different symptoms, and the dataset was expensive to get. In this paper, we proposed a novel model based on Word2vec and Terms Frequency-Inverse Document Frequency (TF-IDF), which could be used to detect damp-heat syndrome on unstructured records directly. Firstly, we adopt ICTCLAS system combined with corpus collected in the field of Traditional Chinese Medicine (TCM) to segment the clinical records into words. Secondly, Word2vec tool was used to train word vector. Then, we constructed the record representation vector according to word vector and TF-IDF. The record representation method was named Word2vec+ TF-IDF. In order to verify the effectiveness of the proposed method, we compared our record representation method with other text representation methods under four different classifiers. The experiment was conducted on the dataset collected from over 10 Chinese Medicine hospitals. And the experimental results show that our model perform better than the state-of-theart methods such as LSA and Doc2vec.
引用
收藏
页码:1415 / 1420
页数:6
相关论文
empty
未找到相关数据