PMC-LLaMA: toward building open-source language models for medicine

被引:42
作者
Wu, Chaoyi [1 ,2 ]
Lin, Weixiong [1 ,2 ]
Zhang, Xiaoman [1 ,2 ]
Zhang, Ya [1 ,2 ]
Xie, Weidi [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr CM, Shanghai 200240, Peoples R China
[2] Shanghai AI Lab, Shanghai 200232, Peoples R China
基金
国家重点研发计划;
关键词
large language models; biomedical NLP; generative language models; ChatGPT;
D O I
10.1093/jamia/ocae045
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Recently, large language models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering (QA) situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this article, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.Materials and methods We adapt a general-purpose LLM toward the medical domain, involving data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive domain-specific instruction fine-tuning, encompassing medical QA, rationale for reasoning, and conversational dialogues with 202M tokens.Results While evaluating various public medical QA benchmarks and manual rating, our lightweight PMC-LLaMA, which consists of only 13B parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, and datasets for instruction tuning will be released to the research community.Discussion Our contributions are 3-fold: (1) we build up an open-source LLM toward the medical domain. We believe the proposed PMC-LLaMA model can promote further development of foundation models in medicine, serving as a medical trainable basic generative language backbone; (2) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component, demonstrating how different training data and model scales affect medical LLMs; (3) we contribute a large-scale, comprehensive dataset for instruction tuning.Conclusion In this article, we systematically investigate the process of building up an open-source medical-specific LLM, PMC-LLaMA.
引用
收藏
页码:1833 / 1843
页数:11
相关论文
共 48 条
  • [1] Anil Rohan, 2023, PaLM 2 Technical Report
  • [2] Chen T., 2016, ARXIV
  • [3] Chen Ziyang, 2023, arXiv
  • [4] Chiang WL., 2023, LMSYS ORG
  • [5] Christophe C., 2023, MED42 CLIN LARGE LAN, VM42
  • [6] Chung Hyung Won, 2022, arXiv
  • [7] Diao Shizhe, 2023, Lmflow: An extensible toolkit for finetuning and inference of large foundation models
  • [8] Du Zhengxiao, 2022, Glm: General language model pretraining with autoregressive blank infilling
  • [9] GPT-3: Its Nature, Scope, Limits, and Consequences
    Floridi, Luciano
    Chiriatti, Massimo
    [J]. MINDS AND MACHINES, 2020, 30 (04) : 681 - 694
  • [10] Han Tianyu, 2023, arXiv