Development of a large-scale medical visual question-answering dataset

被引：2

作者：

Zhang, Xiaoman ^{[1
,2
]}

Wu, Chaoyi ^{[1
,2
]}

Zhao, Ziheng ^{[1
,2
]}

Lin, Weixiong ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

Xie, Weidi ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

来源：

COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期

关键词：

D O I：

10.1038/s43856-024-00709-2

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.

引用

页数：13

共 74 条

[11]

Chaya S, 2018, Afr J Thorac Crit Care Med, V24, DOI 10.7196/SARJ.2018.v24i3.191

[12]

Chen X., Chatffa: interactive visual question answering on fundus fluorescein angiography image using chatgpt

[13]

Chen Z., 2024, AAAI Spring Symposium Series

[14] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [J].

Chen, Zhihong ;

Du, Yuhao ;

Hu, Jinpeng ;

Liu, Yang ;

Li, Guanbin ;

Wan, Xiang ;

Chang, Tsung-Hui .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 :679-689

[15] DWT-CV: Dense weight transfer-based cross validation strategy for model selection in biomedical data analysis [J].

Cheng, Jianhong ;

Kuang, Hulin ;

Zhao, Qichang ;

Wang, Yahui ;

Xu, Lei ;

Liu, Jin ;

Wang, Jianxin .

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 :20-29

[16]

Chiang Wei-Lin, 2023, Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality

[17] The future landscape of large language models in medicine [J].

Clusmann, Jan ;

Kolbinger, Fiona R. ;

Muti, Hannah Sophie ;

Carrero, Zunamys I. ;

Eckardt, Jan-Niklas ;

Laleh, Narmin Ghaffari ;

Loeffler, Chiara Maria Lavinia ;

Schwarzkopf, Sophie-Caroline ;

Unger, Michaela ;

Veldhuizen, Gregory P. ;

Wagner, Sophia J. ;

Kather, Jakob Nikolas .

COMMUNICATIONS MEDICINE, 2023, 3 (01)

[18]

Demirhan H., 2023, BioMedInformatics, V4, P50, DOI [10.3390/biomedinformatics4010004, DOI 10.3390/BIOMEDINFORMATICS4010004]

[19]

Feng J., 2021, P IEEE CVF C COMP VI, P11433

[20]

Gao L, 2020, Arxiv, DOI [arXiv:2101.00027, 10.48550/arXiv.2101.00027]

← 1 2 3 4 5 6 7 8 →