Development of a large-scale medical visual question-answering dataset

被引:2
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhao, Ziheng [1 ,2 ]
Lin, Weixiong [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期
关键词
D O I
10.1038/s43856-024-00709-2
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.
引用
收藏
页数:13
相关论文
共 74 条
[51]   Radiology Objects in COntext (ROCO): A Multimodal Image Dataset [J].
Pelka, Obioma ;
Koitka, Sven ;
Rueckert, Johannes ;
Nensa, Felix ;
Friedrich, Christoph M. .
INTRAVASCULAR IMAGING AND COMPUTER ASSISTED STENTING AND LARGE-SCALE ANNOTATION OF BIOMEDICAL DATA AND EXPERT LABEL SYNTHESIS, 2018, 11043 :180-189
[52]  
Radford A, 2021, PR MACH LEARN RES, V139
[53]   PubMed Central: The GenBank of the published literature [J].
Roberts, RJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (02) :381-382
[54]  
Safranek CW, 2023, JMIR MED EDUC, V9, DOI [10.2023/1/e50945, 10.2196/50945]
[55]  
Seyfioglu M. S., 2024, CVPR
[56]   Large language models encode clinical knowledge [J].
Singhal, Karan ;
Azizi, Shekoofeh ;
Tu, Tao ;
Mahdavi, S. Sara ;
Wei, Jason ;
Chung, Hyung Won ;
Scales, Nathan ;
Tanwani, Ajay ;
Cole-Lewis, Heather ;
Pfohl, Stephen ;
Payne, Perry ;
Seneviratne, Martin ;
Gamble, Paul ;
Kelly, Chris ;
Babiker, Abubakr ;
Schaerli, Nathanael ;
Chowdhery, Aakanksha ;
Mansfield, Philip ;
Demner-Fushman, Dina ;
Arcas, Blaise Aguera y ;
Webster, Dale ;
Corrado, Greg S. ;
Matias, Yossi ;
Chou, Katherine ;
Gottweis, Juraj ;
Tomasev, Nenad ;
Liu, Yun ;
Rajkomar, Alvin ;
Barral, Joelle ;
Semturs, Christopher ;
Karthikesalingam, Alan ;
Natarajan, Vivek .
NATURE, 2023, 620 (7972) :172-+
[57]  
Subramanian S, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P2112
[58]   COVID-19-Associated Spontaneous Pneumomediastinum and Pneumopericardium: Review of Case Series [J].
Suresh, Krithika ;
Figart, Michael W. ;
Mehmood, Talha ;
Butt, Asfandyar ;
Sherwal, Amanpreet .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2021, 13 (11)
[59]   Large language models in medicine [J].
Thirunavukarasu, Arun James ;
Ting, Darren Shu Jeng ;
Elangovan, Kabilan ;
Gutierrez, Laura ;
Tan, Ting Fang ;
Ting, Daniel Shu Wei .
NATURE MEDICINE, 2023, 29 (08) :1930-1940
[60]  
Touvron H, 2023, Arxiv, DOI arXiv:2302.13971