Traditional Chinese Medicine Symptom Normalization Approach Based on Pre-Trained Language Models

被引:0
作者
Xie Y. [1 ,2 ]
Tao H. [1 ,2 ]
Jia Q. [1 ,2 ]
Yang S. [1 ,2 ]
Han X. [2 ]
机构
[1] School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing
[2] Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing
来源
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications | 2022年 / 45卷 / 04期
关键词
Entity matching; Pre-trained language model; Semantic classification; Symptom normalization; Traditional Chinese medicine;
D O I
10.13190/j.jbupt.2021-191
中图分类号
学科分类号
摘要
To solve the issue in traditional Chinese medicine that one symptom has different literal descriptions and one symptom corresponds to multiple normalized descriptions, a two-stage framework based on pre-trained language models is proposed. In the first step, according to the definition and classification of symptoms, a multi-label text classification model is adopted to semantically divide the symptom descriptions to obtain candidate normalization symptom words. In the second step, we score and sort the candidate normalization symptom words with an entity matching model, and some strategies are designed to perform a second recall of the results to improve performance. After that, the candidate word with the highest score in each semantic label is regarded as the normalization result. Experiments results show that the proposed method performs better than traditional methods on solving the symptom normalization problem. Furthermore, the research compares and analyzes the results using different pre-trained language models on the symptom normalization task to verify the effectiveness of the proposed method. © 2022, Editorial Department of Journal of Beijing University of Posts and Telecommunications. All right reserved.
引用
收藏
页码:13 / 18and57
页数:1844
相关论文
共 11 条
  • [1] QIU X P, SUN T X, XU Y G, Et al., Pre-trained models for natural language processing: a survey, Science China Technological Sciences, 63, 10, pp. 1872-1897, (2020)
  • [2] JIA Q, ZHANG D Z, YANG S B, Et al., Traditional Chinese medicine symptom normalization approach leveraging hierarchical semantic information and text matching with attention mechanism[J/OL], Journal of Biomedical Informatics, 116, 6, (2021)
  • [3] TUTUBALINA E, MIFTAHUTDINOV Z, NIKOLENKO S, Et al., Medical concept normalization in social media posts with recurrent neural networks, Journal of Biomedical Informatics, 84, pp. 93-102, (2018)
  • [4] DENG P, CHEN H P, HUANG M Y, Et al., An ensemble CNN method for biomedical entity normalization, Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pp. 143-149, (2019)
  • [5] AN Y, WANG J L, ZHANG L, Et al., PASCAL: a pseudo cascade learning framework for breast cancer treatment entity normalization in Chinese clinical text, BMC Medical Informatics and Decision Making, 20, 1, pp. 1-12, (2020)
  • [6] CHEN Q, ZHU X D, LING Z H, Et al., Enhanced LSTM for natural language inference, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1657-1668, (2017)
  • [7] DAI Z Y, XIONG C Y, CALLAN J, Et al., Convolutional neural networks for soft-matching n-grams in Ad-hoc search, Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp. 126-134, (2018)
  • [8] DEVLIN J, CHANG M W, LEE K, Et al., BERT: pre-training of deep bidirectional transformers for language understanding, Proceedings of NAACL-HLT, pp. 4171-4186, (2019)
  • [9] LIU Y H, OTT M, GOYAL N, Et al., RoBERTa: a robustly optimized BERT pretraining approach
  • [10] SUN Y, WANG S H, LI Y K, Et al., ERNIE: enhanced representation through knowledge integration