TCMChat: A generative large language model for traditional Chinese medicine

被引:0
作者
Dai, Yizheng [1 ,2 ]
Shao, Xin [1 ,2 ,3 ]
Zhang, Jinlu [1 ,2 ]
Chen, Yulong [2 ]
Chen, Qian [1 ,2 ,3 ]
Liao, Jie [1 ,2 ]
Chi, Fei [2 ]
Zhang, Junhua [4 ]
Fan, Xiaohui [1 ,2 ,3 ,5 ]
机构
[1] Zhejiang Univ, Pharmaceut Informat Inst, Coll Pharmaceut Sci, Hangzhou 310058, Peoples R China
[2] Zhejiang Univ, Innovat Ctr Yangtze River Delta, State Key Lab Chinese Med Modernizat, Jiaxing 314103, Peoples R China
[3] Ningbo Municipal Hosp TCM, Joint Lab Clin Multiomics Res Zhejiang Univ & Ning, Ningbo 315000, Peoples R China
[4] Tianjin Univ Tradit Chinese Med, State Key Lab Chinese Med Modernizat, Tianjin 301617, Peoples R China
[5] Zhejiang Univ, Womens Hosp, Sch Med, Zhejiang Key Lab Precis Diag & Therapy Major Gynec, Hangzhou 310006, Peoples R China
关键词
Traditional Chinese medicine; Large language model; Dialogue system; Pre-training; Supervised fine-tuning;
D O I
10.1016/j.phrs.2024.107530
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
The utilization of ground-breaking large language models (LLMs) accompanied with dialogue system has been progressively prevalent in the medical domain. Nevertheless, the expertise of LLMs in Traditional Chinese Medicine (TCM) remains restricted despite several TCM LLMs proposed recently. Herein, we introduced TCMChat (https://xomics.com.cn/tcmchat), a generative LLM with pre-training (PT) and supervised fine-tuning (SFT) on large-scale curated TCM text knowledge and Chinese Question-Answering (QA) datasets. In detail, we first compiled a customized collection of six scenarios of Chinese medicine as the training set by text mining and manual verification, involving TCM knowledgebase, choice question, reading comprehension, entity extraction, medical case diagnosis, and herb or formula recommendation. Next, we subjected the model to PT and SFT, using the Baichuan2-7B-Chat as the foundation model. The benchmarking datasets and case studies further demonstrate the superior performance of TCMChat in comparison to existing models. Our code, data and model are publicly released on GitHub (https://github.com/ZJUFanLab/TCMChat) and HuggingFace (https://huggingface. co/ZJUFanLab), providing high-quality knowledgebase for the research of TCM modernization with a userfriendly dialogue web tool.
引用
收藏
页数:15
相关论文
共 54 条
  • [11] Falcon LLM Team, 2023, Arxiv, DOI [arXiv:2311.16867, 10.48550/arXiv.2311.16867]
  • [12] HERB: a high-throughput experiment- and reference-guided database of traditional Chinese medicine
    Fang, ShuangSang
    Dong, Lei
    Liu, Liu
    Guo, JinCheng
    Zhao, LianHe
    Zhang, JiaYuan
    Bu, DeChao
    Liu, XinKui
    Huo, PeiPei
    Cao, WanChen
    Dong, QiongYe
    Wu, JiaRui
    Zeng, Xiaoxi
    Wu, Yang
    Zhao, Yi
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D1197 - D1206
  • [13] Fang Y, 2024, Arxiv, DOI arXiv:2306.08018
  • [14] Feder A, 2021, Arxiv, DOI arXiv:2005.13407
  • [15] Recent Advances in Boron-Containing Acenes: Synthesis, Properties, and Optoelectronic Applications
    Guo, Yongkang
    Chen, Cheng
    Wang, Xiao-Ye
    [J]. CHINESE JOURNAL OF CHEMISTRY, 2023, 41 (11): : 1355 - 1373
  • [16] Han X, 2021, Arxiv, DOI [arXiv:2106.07139, 10.48550/arXiv.2106.07139, DOI 10.48550/ARXIV.2106.07139]
  • [17] Chemprop: A Machine Learning Package for Chemical Property Prediction
    Heid, Esther
    Greenman, Kevin P.
    Chung, Yunsie
    Li, Shih-Cheng
    Graff, David E.
    Vermeire, Florence H.
    Wu, Haoyang
    Green, William H.
    Mcgill, Charles J.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 64 (01) : 9 - 17
  • [18] Hu EJ, 2021, Arxiv, DOI arXiv:2106.09685
  • [19] Huang YZ, 2023, Arxiv, DOI [arXiv:2305.08322, 10.48550/arXiv.2305.08322]
  • [20] A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version)
    Jin, Ying-Hui
    Cai, Lin
    Cheng, Zhen-Shun
    Cheng, Hong
    Deng, Tong
    Fan, Yi-Pin
    Fang, Cheng
    Huang, Di
    Huang, Lu-Qi
    Huang, Qiao
    Han, Yong
    Hu, Bo
    Hu, Fen
    Li, Bing-Hui
    Li, Yi-Rong
    Liang, Ke
    Lin, Li-Kai
    Luo, Li-Sha
    Ma, Jing
    Ma, Lin-Lu
    Peng, Zhi-Yong
    Pan, Yun-Bao
    Pan, Zhen-Yu
    Ren, Xue-Qun
    Sun, Hui-Min
    Wang, Ying
    Wang, Yun-Yun
    Weng, Hong
    Wei, Chao-Jie
    Wu, Dong-Fang
    Xia, Jian
    Xiong, Yong
    Xu, Hai-Bo
    Yao, Xiao-Mei
    Yuan, Yu-Feng
    Ye, Tai-Sheng
    Zhang, Xiao-Chun
    Zhang, Ying-Wen
    Zhang, Yin-Gao
    Zhang, Hua-Min
    Zhao, Yan
    Zhao, Ming-Juan
    Zi, Hao
    Zeng, Xian-Tao
    Wang, Yong-Yan
    Wang, Xing-Huan
    [J]. MILITARY MEDICAL RESEARCH, 2020, 7 (01)