An Improved Deep Learning Model: S-TextBLCNN for Traditional Chinese Medicine Formula Classification

被引:13
作者
Cheng, Ning [1 ]
Chen, Yue [1 ]
Gao, Wanqing [1 ]
Liu, Jiajun [1 ]
Huang, Qunfu [1 ]
Yan, Cheng [1 ,2 ]
Huang, Xindi [1 ]
Ding, Changsong [1 ,2 ]
机构
[1] Hunan Univ Chinese Med, Sch Informat, Changsha, Peoples R China
[2] Hunan Univ Chinese Med, Big Data Anal Lab Tradit Chinese Med, Changsha, Peoples R China
关键词
S-TextBLCNN model; deep learning; formula classification; formula-vector; data imbalance; PERFORMANCE; PREDICTION; ALGORITHMS; FRAMEWORK; LANGUAGE; WORD2VEC; NETWORK;
D O I
10.3389/fgene.2021.807825
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination.Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia, natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie, an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed.Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F-1-score of 0.762, both higher than the logistic regression (acc = 0.561, F-1-score = 0.567), SVM (acc = 0.703, F-1-score = 0.591), LSTM (acc = 0.723, F-1-score = 0.621), and TextCNN (acc = 0.745, F-1-score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F-1-score is greatly improved by an average of 47.1% in 19 models.Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.
引用
收藏
页数:10
相关论文
共 39 条
  • [21] Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews
    Muhammad, Putra Fissabil
    Kusumaningrum, Retno
    Wibowo, Adi
    [J]. 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 728 - 735
  • [22] Big data and machine learning algorithms for health-care delivery
    Ngiam, Kee Yuan
    Khor, Ing Wei
    [J]. LANCET ONCOLOGY, 2019, 20 (05) : E262 - E273
  • [23] The language of proteins: NLP, machine learning & protein sequences
    Ofer, Dan
    Brandes, Nadav
    Linial, Michal
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 1750 - 1758
  • [24] Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning
    Poplin, Ryan
    Varadarajan, Avinash V.
    Blumer, Katy
    Liu, Yun
    McConnell, Michael V.
    Corrado, Greg S.
    Peng, Lily
    Webster, Dale R.
    [J]. NATURE BIOMEDICAL ENGINEERING, 2018, 2 (03): : 158 - 164
  • [25] Text Classification for Clinical Trial Operations: Evaluation and Comparison of Natural Language Processing Techniques
    Richard, Emma
    Reddy, Bhargava
    [J]. THERAPEUTIC INNOVATION & REGULATORY SCIENCE, 2021, 55 (02) : 447 - 453
  • [26] In-silico approach for drug induced liver injury prediction: Recent advances
    Saini, Neha
    Bakshi, Shikha
    Sharma, Sadhna
    [J]. TOXICOLOGY LETTERS, 2018, 295 : 288 - 295
  • [27] Machine Learning for Prediction of Posttraumatic Stress and Resilience Following Trauma: An Overview of Basic Concepts and Recent Advances
    Schultebraucks, Katharina
    Galatzer-Levy, Isaac R.
    [J]. JOURNAL OF TRAUMATIC STRESS, 2019, 32 (02) : 215 - 225
  • [28] Song ZH, 2019, PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), P1383, DOI [10.1109/itaic.2019.8785612, 10.1109/ITAIC.2019.8785612]
  • [29] Evaluating the Traditional Chinese Medicine (TCM) Officially Recommended in China for COVID-19 Using Ontology-Based Side-Effect Prediction Framework (OSPF) and Deep Learning
    Wang, Zeheng
    Li, Liang
    Song, Miao
    Yan, Jing
    Shi, Junjie
    Yao, Yuanzhe
    [J]. JOURNAL OF ETHNOPHARMACOLOGY, 2021, 272
  • [30] Wang ZP, 2015, 2015 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING (ICICS)