Text Mining and Analysis of Treatise on Febrile Diseases Based on Natural Language Processing

被引:0
作者
Kai Zhao [1 ]
Na Shi [1 ]
Zhen Sa [1 ]
HuaXing Wang [1 ]
ChunHua Lu [2 ]
XiaoYing Xu [1 ]
机构
[1] School of Traditional Chinese Medicine, Beijing University of Chinese Medicine
[2] School of Life Science, Beijing University of Chinese Medicine
关键词
Knowledge discovery; natural language processing; text mining; traditional Chinese medicine literature; treatise on febrile diseases;
D O I
暂无
中图分类号
R441.3 [发热]; TP391.1 [文字信息处理];
学科分类号
100208 ; 081203 ; 0835 ;
摘要
Objective:With using natural language processing (NLP) technology to analyze and process the text of "Treatise on Febrile Diseases (TFDs)"for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM)literature. Materials and Methods:Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim,and sklearn library, and combined with Excel and Word software. The text of "TFDs" was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results:Jieba can accurately identify the herbal name in "TFDs." Word frequency statistics based on the word segmentation found that "warm therapy" is an important treatment of "TFDs." Guizhi decoction is the main prescription,and five core decoctions are identified. Keyword extraction based on the term "frequency-inverse document frequency" algorithm is ideal.The accuracy of NER in "TFDs" is about 86%; latent semantic indexing model calculating the similarity,"Understanding of Synopsis of Golden Chamber (SGC)" is much more similar with "SGC" than with "TFDs." The results meet expectation. Conclusions:It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology,NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.
引用
收藏
页码:67 / 73
页数:7
相关论文
共 13 条
[1]   我国中医药信息化建设与发展的思考 [J].
肖勇 ;
田双桂 ;
沈绍武 .
医学信息学杂志, 2019, 40 (07) :12-17
[2]   中医文献语料库建设与顶层设计刍议 [J].
闻永毅 ;
王治梅 .
西部中医药, 2018, 31 (07) :62-65
[3]   智能问答系统在医学领域的应用研究 [J].
贺佳 ;
杜建强 ;
聂斌 ;
熊旺平 ;
罗计根 .
医学信息, 2018, 31 (14) :16-19
[4]   方证辨证发展脉络及应用前景 [J].
王方方 ;
陈家旭 ;
宋明 ;
侯雅静 ;
潘秋霞 .
北京中医药大学学报, 2017, 40 (02) :103-106
[5]   方证关系研究思路 [J].
曹灵修 ;
张林 .
中华中医药杂志, 2016, (08) :3166-3169
[6]   基于条件随机场的《伤寒论》中医术语自动识别 [J].
孟洪宇 ;
谢晴宇 ;
常虹 ;
孟庆刚 .
北京中医药大学学报, 2015, 38 (09) :587-590
[7]   《伤寒论》版本研究概述 [J].
阎琪 ;
张瑞彬 ;
张海洋 ;
陈凤芝 .
长春中医药大学学报, 2015, 31 (03) :635-637
[8]   《备急千金要方》对张仲景学术思想的传承探究 [J].
刘毅 ;
董利利 .
天津中医药大学学报, 2014, 33 (04) :196-198
[9]   浅谈《金匮要略心典》的学术思想 [J].
张瑞 ;
李雪梅 ;
付笑萍 .
中华中医药学刊, 2010, 28 (10) :2174-2175
[10]   Jumping NLP Curves: A Review of Natural Language Processing Research [J].
Cambria, Erik ;
White, Bebo .
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2014, 9 (02) :48-57