Development of a large-scale medical visual question-answering dataset

被引:2
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhao, Ziheng [1 ,2 ]
Lin, Weixiong [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期
关键词
D O I
10.1038/s43856-024-00709-2
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.
引用
收藏
页数:13
相关论文
共 74 条
[21]   A Self-Adaptive Discriminative Autoencoder for Medical Applications [J].
Ge, Xiaolong ;
Qu, Yanpeng ;
Shang, Changjing ;
Yang, Longzhi ;
Shen, Qiang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) :8875-8886
[22]   Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [J].
Gu, Yu ;
Tinn, Robert ;
Cheng, Hao ;
Lucas, Michael ;
Usuyama, Naoto ;
Liu, Xiaodong ;
Naumann, Tristan ;
Gao, Jianfeng ;
Poon, Hoifung .
ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE, 2022, 3 (01)
[23]   Coronary Magnetic Resonance Angiography in Chronic Coronary Syndromes [J].
Hajhosseiny, Reza ;
Munoz, Camila ;
Cruz, Gastao ;
Khamis, Ramzi ;
Kim, Won Yong ;
Prieto, Claudia ;
Botnar, Rene M. .
FRONTIERS IN CARDIOVASCULAR MEDICINE, 2021, 8
[24]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[25]  
He XH, 2021, ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, P708
[26]  
Hu YT, 2024, Arxiv, DOI arXiv:2402.09181
[27]   What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams [J].
Jin, Di ;
Pan, Eileen ;
Oufattole, Nassim ;
Weng, Wei-Hung ;
Fang, Hanyi ;
Szolovits, Peter .
APPLIED SCIENCES-BASEL, 2021, 11 (14)
[28]   A Case of Euthyroid Steroid-Responsive Encephalopathy With Subacute Dementia [J].
John, Rebecca ;
Datta, Abhigyan ;
Ovallath, Sujith .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2021, 13 (09)
[29]  
Jones K. N., 2001, P AMIA S, V1075
[30]   Malignant Proliferating Trichilemmal Tumor: A Subtle Presentation in an African American Woman and Review of Immunohistochemical Markers for This Rare Condition [J].
Joshi, Tejas P. ;
Marchand, Sharon ;
Tschen, Jaime .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2021, 13 (08)