A survey of datasets in medicine for large language models

被引：0

作者：

Zhang, Deshiwei ^{[1
]}

Xue, Xiaojuan ^{[2
]}

Gao, Peng ^{[3
]}

Jin, Zhijuan ^{[4
]}

Hu, Menghan ^{[2
]}

Wu, Yue ^{[5
]}

Ying, Xiayang ^{[6
]}

机构：

[1] Southeast Univ, Sch Civil Engn, Nanjing 210096, Jiangsu, Peoples R China

[2] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, 500 Dongchuan Rd, Shanghai 200241, Peoples R China

[3] Tongji Univ, Shanghai Peoples Hosp 10, Sch Med, Dept Ophthalmol, Shanghai 200072, Peoples R China

[4] Shanghai Jiao Tong Univ, Shanghai Childrens Med Ctr, Sch Med, Dept Dev & Behav Pediat, Shanghai 200127, Peoples R China

[5] Shanghai Jiao Tong Univ, Peoples Hosp 9, Sch Med, Dept Ophthalmol, Shanghai 200011, Peoples R China

[6] Shanghai Jiao Tong Univ, Ruijin Hosp, Pancreat Dis Ctr, Sch Med,Dept Gen Surg, 197 Ruijin 2nd Rd, Shanghai 200001, Peoples R China

来源：

INTELLIGENCE & ROBOTICS | 2024年 / 4卷 / 04期

关键词：

Large language models (LLMs); NLP; dataset in medicine; Q&A system in medicine;

D O I：

10.20517/ir.2024.27

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the advent of models such as ChatGPT and other models, large language models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language, presenting novel opportunities and challenges within the medicine domain. While there have been many studies focusing on the employment of LLMs in medicine, comprehensive reviews of the datasets utilized in this field remain scarce. This survey seeks to address this gap by providing a comprehensive overview of the datasets in medicine fueling LLMs, highlighting their unique characteristics and the critical roles they play at different stages of LLMs' development: pre-training, fine-tuning, and evaluation. Ultimately, this survey aims to underline the significance of datasets in realizing the full potential of LLMs to innovate and improve healthcare outcomes.

引用

页码：457 / 478

页数：22

共 50 条

[31] Towards Reasoning in Large Language Models: A Survey
Huang, Jie
Chang, Kevin Chen-Chuan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1049 - 1065
[32] Large Language Models on Graphs: A Comprehensive Survey
Jin, Bowen
Liu, Gang
Han, Chi
Jiang, Meng
Ji, Heng
Han, Jiawei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8622 - 8642
[33] A Survey of Robot Intelligence with Large Language Models
Jeong, Hyeongyo
Lee, Haechan
Kim, Changwon
Shin, Sungtae
APPLIED SCIENCES-BASEL, 2024, 14 (19):
[34] A Survey on Hardware Accelerators for Large Language Models
Kachris, Christoforos
APPLIED SCIENCES-BASEL, 2025, 15 (02):
[35] Pre-trained language models in medicine: A survey *
Luo, Xudong
Deng, Zhiqi
Yang, Binxia
Luo, Michael Y.
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
[36] Can large language models help augment English psycholinguistic datasets?
Trott, Sean
BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 6082 - 6100
[37] Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models
Reif, Emily
Kahng, Minsuk
Petridis, Savvas
2023 IEEE VISUALIZATION AND VISUAL ANALYTICS, VIS, 2023, : 236 - 240
[38] Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
Zhao, Wei
Li, Zhe
Li, Yige
Sun, Jun
arXiv,
[39] The application of large language models in medicine: A scoping review
Meng, Xiangbin
Yan, Xiangyu
Zhang, Kuo
Liu, Da
Cui, Xiaojuan
Yang, Yaodong
Zhang, Muhan
Cao, Chunxia
Wang, Jingjia
Wang, Xuliang
Gao, Jun
Wang, Yuan-Geng-Shuo
Ji, Jia-ming
Qiu, Zifeng
Li, Muzi
Qian, Cheng
Guo, Tianze
Ma, Shuangquan
Wang, Zeying
Guo, Zexuan
Lei, Youlan
Shao, Chunli
Wang, Wenyao
Fan, Haojun
Tang, Yi-Da
ISCIENCE, 2024, 27 (05)
[40] Implications of large language models such as ChatGPT for dental medicine
Eggmann, Florin
Weiger, Roland
Zitzmann, Nicola U.
Blatz, Markus B.
JOURNAL OF ESTHETIC AND RESTORATIVE DENTISTRY, 2023, 35 (07) : 1098 - 1102

← 1 2 3 4 5 →