Development of a large-scale medical visual question-answering dataset

被引：2

作者：

Zhang, Xiaoman ^{[1
,2
]}

Wu, Chaoyi ^{[1
,2
]}

Zhao, Ziheng ^{[1
,2
]}

Lin, Weixiong ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

Xie, Weidi ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

来源：

COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期

关键词：

D O I：

10.1038/s43856-024-00709-2

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.

引用

页数：13

共 74 条

[51] Radiology Objects in COntext (ROCO): A Multimodal Image Dataset [J].

Pelka, Obioma ;

Koitka, Sven ;

Rueckert, Johannes ;

Nensa, Felix ;

Friedrich, Christoph M. .

INTRAVASCULAR IMAGING AND COMPUTER ASSISTED STENTING AND LARGE-SCALE ANNOTATION OF BIOMEDICAL DATA AND EXPERT LABEL SYNTHESIS, 2018, 11043 :180-189

[52]

Radford A, 2021, PR MACH LEARN RES, V139

[53] PubMed Central: The GenBank of the published literature [J].

Roberts, RJ .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (02) :381-382

[54]

Safranek CW, 2023, JMIR MED EDUC, V9, DOI [10.2023/1/e50945, 10.2196/50945]

[55]

Seyfioglu M. S., 2024, CVPR

[56] Large language models encode clinical knowledge [J].

Singhal, Karan ;

Azizi, Shekoofeh ;

Tu, Tao ;

Mahdavi, S. Sara ;

Wei, Jason ;

Chung, Hyung Won ;

Scales, Nathan ;

Tanwani, Ajay ;

Cole-Lewis, Heather ;

Pfohl, Stephen ;

Payne, Perry ;

Seneviratne, Martin ;

Gamble, Paul ;

Kelly, Chris ;

Babiker, Abubakr ;

Schaerli, Nathanael ;

Chowdhery, Aakanksha ;

Mansfield, Philip ;

Demner-Fushman, Dina ;

Arcas, Blaise Aguera y ;

Webster, Dale ;

Corrado, Greg S. ;

Matias, Yossi ;

Chou, Katherine ;

Gottweis, Juraj ;

Tomasev, Nenad ;

Liu, Yun ;

Rajkomar, Alvin ;

Barral, Joelle ;

Semturs, Christopher ;

Karthikesalingam, Alan ;

Natarajan, Vivek .

NATURE, 2023, 620 (7972) :172-+

[57]

Subramanian S, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P2112

[58] COVID-19-Associated Spontaneous Pneumomediastinum and Pneumopericardium: Review of Case Series [J].

Suresh, Krithika ;

Figart, Michael W. ;

Mehmood, Talha ;

Butt, Asfandyar ;

Sherwal, Amanpreet .

CUREUS JOURNAL OF MEDICAL SCIENCE, 2021, 13 (11)

[59] Large language models in medicine [J].

Thirunavukarasu, Arun James ;

Ting, Darren Shu Jeng ;

Elangovan, Kabilan ;

Gutierrez, Laura ;

Tan, Ting Fang ;

Ting, Daniel Shu Wei .

NATURE MEDICINE, 2023, 29 (08) :1930-1940

[60]

Touvron H, 2023, Arxiv, DOI arXiv:2302.13971

← 1 2 3 4 5 6 7 8 →