Development of a large-scale medical visual question-answering dataset

被引:2
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhao, Ziheng [1 ,2 ]
Lin, Weixiong [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期
关键词
D O I
10.1038/s43856-024-00709-2
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.
引用
收藏
页数:13
相关论文
共 74 条
[11]  
Chaya S, 2018, Afr J Thorac Crit Care Med, V24, DOI 10.7196/SARJ.2018.v24i3.191
[12]  
Chen X., Chatffa: interactive visual question answering on fundus fluorescein angiography image using chatgpt
[13]  
Chen Z., 2024, AAAI Spring Symposium Series
[14]   Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [J].
Chen, Zhihong ;
Du, Yuhao ;
Hu, Jinpeng ;
Liu, Yang ;
Li, Guanbin ;
Wan, Xiang ;
Chang, Tsung-Hui .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 :679-689
[15]   DWT-CV: Dense weight transfer-based cross validation strategy for model selection in biomedical data analysis [J].
Cheng, Jianhong ;
Kuang, Hulin ;
Zhao, Qichang ;
Wang, Yahui ;
Xu, Lei ;
Liu, Jin ;
Wang, Jianxin .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 :20-29
[16]  
Chiang Wei-Lin, 2023, Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality
[17]   The future landscape of large language models in medicine [J].
Clusmann, Jan ;
Kolbinger, Fiona R. ;
Muti, Hannah Sophie ;
Carrero, Zunamys I. ;
Eckardt, Jan-Niklas ;
Laleh, Narmin Ghaffari ;
Loeffler, Chiara Maria Lavinia ;
Schwarzkopf, Sophie-Caroline ;
Unger, Michaela ;
Veldhuizen, Gregory P. ;
Wagner, Sophia J. ;
Kather, Jakob Nikolas .
COMMUNICATIONS MEDICINE, 2023, 3 (01)
[18]  
Demirhan H., 2023, BioMedInformatics, V4, P50, DOI [10.3390/biomedinformatics4010004, DOI 10.3390/BIOMEDINFORMATICS4010004]
[19]  
Feng J., 2021, P IEEE CVF C COMP VI, P11433
[20]  
Gao L, 2020, Arxiv, DOI [arXiv:2101.00027, 10.48550/arXiv.2101.00027]