PMC-LLaMA: toward building open-source language models for medicine

被引：42

作者：

Wu, Chaoyi ^{[1
,2
]}

Lin, Weixiong ^{[1
,2
]}

Zhang, Xiaoman ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Xie, Weidi ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr CM, Shanghai 200240, Peoples R China

[2] Shanghai AI Lab, Shanghai 200232, Peoples R China

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 09期

基金：

国家重点研发计划;

关键词：

large language models; biomedical NLP; generative language models; ChatGPT;

D O I：

10.1093/jamia/ocae045

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective Recently, large language models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering (QA) situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this article, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.Materials and methods We adapt a general-purpose LLM toward the medical domain, involving data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive domain-specific instruction fine-tuning, encompassing medical QA, rationale for reasoning, and conversational dialogues with 202M tokens.Results While evaluating various public medical QA benchmarks and manual rating, our lightweight PMC-LLaMA, which consists of only 13B parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, and datasets for instruction tuning will be released to the research community.Discussion Our contributions are 3-fold: (1) we build up an open-source LLM toward the medical domain. We believe the proposed PMC-LLaMA model can promote further development of foundation models in medicine, serving as a medical trainable basic generative language backbone; (2) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component, demonstrating how different training data and model scales affect medical LLMs; (3) we contribute a large-scale, comprehensive dataset for instruction tuning.Conclusion In this article, we systematically investigate the process of building up an open-source medical-specific LLM, PMC-LLaMA.

引用

页码：1833 / 1843

页数：11

共 48 条

[1] Anil Rohan, 2023, PaLM 2 Technical Report
[2] Chen T., 2016, ARXIV
[3] Chen Ziyang, 2023, arXiv
[4] Chiang WL., 2023, LMSYS ORG
[5] Christophe C., 2023, MED42 CLIN LARGE LAN, VM42
[6] Chung Hyung Won, 2022, arXiv
[7] Diao Shizhe, 2023, Lmflow: An extensible toolkit for finetuning and inference of large foundation models
[8] Du Zhengxiao, 2022, Glm: General language model pretraining with autoregressive blank infilling
[9] GPT-3: Its Nature, Scope, Limits, and Consequences
Floridi, Luciano
Chiriatti, Massimo
[J]. MINDS AND MACHINES, 2020, 30 (04) : 681 - 694
[10] Han Tianyu, 2023, arXiv

← 1 2 3 4 5 →