An Improved Deep Learning Model: S-TextBLCNN for Traditional Chinese Medicine Formula Classification

被引:13
作者
Cheng, Ning [1 ]
Chen, Yue [1 ]
Gao, Wanqing [1 ]
Liu, Jiajun [1 ]
Huang, Qunfu [1 ]
Yan, Cheng [1 ,2 ]
Huang, Xindi [1 ]
Ding, Changsong [1 ,2 ]
机构
[1] Hunan Univ Chinese Med, Sch Informat, Changsha, Peoples R China
[2] Hunan Univ Chinese Med, Big Data Anal Lab Tradit Chinese Med, Changsha, Peoples R China
关键词
S-TextBLCNN model; deep learning; formula classification; formula-vector; data imbalance; PERFORMANCE; PREDICTION; ALGORITHMS; FRAMEWORK; LANGUAGE; WORD2VEC; NETWORK;
D O I
10.3389/fgene.2021.807825
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination.Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia, natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie, an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed.Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F-1-score of 0.762, both higher than the logistic regression (acc = 0.561, F-1-score = 0.567), SVM (acc = 0.703, F-1-score = 0.591), LSTM (acc = 0.723, F-1-score = 0.621), and TextCNN (acc = 0.745, F-1-score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F-1-score is greatly improved by an average of 47.1% in 19 models.Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.
引用
收藏
页数:10
相关论文
共 39 条
  • [1] Long short-term memory
    Hochreiter, S
    Schmidhuber, J
    [J]. NEURAL COMPUTATION, 1997, 9 (08) : 1735 - 1780
  • [2] On the effects of using word2vec representations in neural networks for dialogue act recognition
    Cerisara, Christophe
    Kral, Pavel
    Lenc, Ladislav
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 175 - 193
  • [3] Analyzing Tongue Images Using a Conceptual Alignment Deep Autoencoder
    Dai, Yinglong
    Wang, Guojun
    [J]. IEEE ACCESS, 2018, 6 : 5962 - 5972
  • [4] Optimizing Semantic Deep Forest for tweet topic classification
    Daouadi, Kheir Eddine
    Rebai, Rim Zghal
    Amous, Ikram
    [J]. INFORMATION SYSTEMS, 2021, 101
  • [5] Boosting the performance of over-sampling algorithms through under-sampling the minority class
    de Morais, Romero F. A. B.
    Vasconcelos, Germano C.
    [J]. NEUROCOMPUTING, 2019, 343 : 3 - 18
  • [6] Epidemiology of Asthma in Children and Adults
    Dharmage, Shyamaii C.
    Perret, Jennifer L.
    Custovic, Adrian
    [J]. FRONTIERS IN PEDIATRICS, 2019, 7
  • [7] Global performance of traditional Chinese medicine over three decades
    Fu, Jun-Ying
    Zhang, Xu
    Zhao, Yun-Hua
    Chen, Dar-Zen
    Huang, Mu-Hsuan
    [J]. SCIENTOMETRICS, 2012, 90 (03) : 945 - 958
  • [8] End-to-End syndrome differentiation of Yin deficiency and Yang deficiency in traditional Chinese medicine
    Hu, Qinan
    Yu, Tong
    Li, Jinghua
    Yu, Qi
    Zhu, Ling
    Gu, Yueguo
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 174 : 9 - 15
  • [9] Multi-projection deep learning network for segmentation of 3D medical images
    Indraswari, Rarasmaya
    Kurita, Takio
    Arifin, Agus Zainal
    Suciati, Nanik
    Astuti, Eha Renwi
    [J]. PATTERN RECOGNITION LETTERS, 2019, 125 : 791 - 797
  • [10] Literature-Wide Association Studies (LWAS) for a Rare Disease: Drug Repurposing for Inflammatory Breast Cancer
    Ji, Xiaojia
    Jin, Chunming
    Dong, Xialan
    Dixon, Maria S.
    Williams, Kevin P.
    Zheng, Weifan
    [J]. MOLECULES, 2020, 25 (17):