Benchmarking Biomedical Relation Knowledge in Large Language Models

被引：0

作者：

Zhang, Fenghui ^{[1
]}

Yang, Kuo ^{[1
]}

Zhao, Chenqian ^{[1
]}

Li, Haixu ^{[1
]}

Dong, Xin ^{[1
]}

Tian, Haoyu ^{[1
]}

Zhou, Xuezhong ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing Key Lab Traff Data Anal & Min, Inst Med Intelligence, Beijing 100044, Peoples R China

来源：

BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024 | 2024年 / 14955卷

基金：

中国国家自然科学基金;

关键词：

biomedical knowledge evaluation; large language model; biomedical relationship identification; benchmarking;

D O I：

10.1007/978-981-97-5131-0_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a special knowledge base (KB), a large language model (LLM) stores a great deal of knowledge in the form of the parametric deep neural network, and evaluating the accuracy of the knowledge within this KB has emerged as a key area of interest in LLM research. Although lots of evaluation studies of LLM knowledge have been carried out, due to the complexity and scarcity of biomedical knowledge, there are still few evaluation studies on this kind of knowledge. To address this, we designed five specific identification and evaluation tasks for the biomedical knowledge in LLMs, including the identification of genes for diseases, targets for drugs/compounds, drugs for diseases, and effectiveness for herbs. We selected four well-known LLMs, including GPT-3.5turbo, GPT-4, ChatGLM-std, and LLaMA2-13B, to quantify the quality of biomedical knowledge in LLMs. Comprehensive experiments that include overall evaluation of accuracy and completeness, ablation analysis, few-shot prompt optimization and case study fully benchmarked the performance of LLMs in the identification of biomedical knowledge and assessed the quality of biomedical knowledge implicit in LLMs. Experimental results showed some interesting observations, e.g., the incompleteness and bias of knowledge of different LLMs, which will give us some insight into LLMs for biomedical discovery and application.

引用

页码：482 / 495

页数：14

共 47 条

[1] Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions [J].

Abd-alrazaq, Alaa ;

AlSaad, Rawan ;

Alhuwail, Dari ;

Ahmed, Arfan ;

Healy, Padraig Mark ;

Latifi, Syed ;

Aziz, Sarah ;

Damseh, Rafat ;

Alrazak, Sadam Alabed ;

Sheikh, Javaid .

JMIR MEDICAL EDUCATION, 2023, 9

[2]

2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774, 10.48550/arXiv.2303.08774]

[3]

Agrawal G, 2024, Arxiv, DOI arXiv:2311.07914

[4]

Bang Y, 2023, Arxiv, DOI [arXiv:2302.04023, 10.48550/arXiv.2302.04023]

[5] The Unified Medical Language System (UMLS): integrating biomedical terminology [J].

Bodenreider, O .

NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270

[6]

Brown TB, 2020, ADV NEUR IN, V33

[7]

Chiang Wei-Lin, 2023, Vicuna: An Open -Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

[8]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[9]

Du ZX, 2022, Arxiv, DOI arXiv:2103.10360

[10]

Fei ZW, 2023, Arxiv, DOI arXiv:2309.16289

← 1 2 3 4 5 →