Development of a large-scale medical visual question-answering dataset

被引:2
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhao, Ziheng [1 ,2 ]
Lin, Weixiong [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期
关键词
D O I
10.1038/s43856-024-00709-2
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.
引用
收藏
页数:13
相关论文
共 74 条
[1]  
2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2]  
Alayrac JB, 2022, ADV NEUR IN
[3]   The Medical Segmentation Decathlon [J].
Antonelli, Michela ;
Reinke, Annika ;
Bakas, Spyridon ;
Farahani, Keyvan ;
Kopp-Schneider, Annette ;
Landman, Bennett A. ;
Litjens, Geert ;
Menze, Bjoern ;
Ronneberger, Olaf ;
Summers, Ronald M. ;
van Ginneken, Bram ;
Bilello, Michel ;
Bilic, Patrick ;
Christ, Patrick F. ;
Do, Richard K. G. ;
Gollub, Marc J. ;
Heckers, Stephan H. ;
Huisman, Henkjan ;
Jarnagin, William R. ;
McHugo, Maureen K. ;
Napel, Sandy ;
Pernicka, Jennifer S. Golia ;
Rhode, Kawal ;
Tobon-Gomez, Catalina ;
Vorontsov, Eugene ;
Meakin, James A. ;
Ourselin, Sebastien ;
Wiesenfarth, Manuel ;
Arbelaez, Pablo ;
Bae, Byeonguk ;
Chen, Sihong ;
Daza, Laura ;
Feng, Jianjiang ;
He, Baochun ;
Isensee, Fabian ;
Ji, Yuanfeng ;
Jia, Fucang ;
Kim, Ildoo ;
Maier-Hein, Klaus ;
Merhof, Dorit ;
Pai, Akshay ;
Park, Beomhee ;
Perslev, Mathias ;
Rezaiifar, Ramin ;
Rippel, Oliver ;
Sarasua, Ignacio ;
Shen, Wei ;
Son, Jaemin ;
Wachinger, Christian ;
Wang, Liansheng .
NATURE COMMUNICATIONS, 2022, 13 (01)
[4]  
Awadalla A., 2023, Openflamingo
[5]  
Bajwa Junaid, 2021, Future Healthc J, V8, pe188, DOI 10.7861/fhj.2021-0095
[6]   Normative spino-pelvic sagittal alignment of Lebanese asymptomatic adults: Comparisons with different ethnicities [J].
Bakouny, Z. ;
Assi, A. ;
Yared, F. ;
Bizdikian, A. J. ;
Otayek, J. ;
Nacouzi, R. ;
Lafage, V. ;
Lafage, R. ;
Ghanem, I. ;
Kreichati, G. .
ORTHOPAEDICS & TRAUMATOLOGY-SURGERY & RESEARCH, 2018, 104 (05) :557-564
[7]  
Ben Abacha A., 2019, P C LABS EV FOR CLEF
[8]  
Ben Abacha A., 2021, P CLEF 2021 C LABS E
[9]  
Bethesda M., 2006, Medpix™ receives patent
[10]   Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts [J].
Changpinyo, Soravit ;
Sharma, Piyush ;
Ding, Nan ;
Soricut, Radu .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3557-3567