Development of a large-scale medical visual question-answering dataset

被引：2

作者：

Zhang, Xiaoman ^{[1
,2
]}

Wu, Chaoyi ^{[1
,2
]}

Zhao, Ziheng ^{[1
,2
]}

Lin, Weixiong ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

Xie, Weidi ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

来源：

COMMUNICATIONS MEDICINE | 2024年 / 4卷 / 01期

关键词：

D O I：

10.1038/s43856-024-00709-2

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

BackgroundMedical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.MethodsWe constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.ResultsHere, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.ConclusionsThe PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.

引用

页数：13

共 74 条

[1]

2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]

[2]

Alayrac JB, 2022, ADV NEUR IN

[3] The Medical Segmentation Decathlon [J].

Antonelli, Michela ;

Reinke, Annika ;

Bakas, Spyridon ;

Farahani, Keyvan ;

Kopp-Schneider, Annette ;

Landman, Bennett A. ;

Litjens, Geert ;

Menze, Bjoern ;

Ronneberger, Olaf ;

Summers, Ronald M. ;

van Ginneken, Bram ;

Bilello, Michel ;

Bilic, Patrick ;

Christ, Patrick F. ;

Do, Richard K. G. ;

Gollub, Marc J. ;

Heckers, Stephan H. ;

Huisman, Henkjan ;

Jarnagin, William R. ;

McHugo, Maureen K. ;

Napel, Sandy ;

Pernicka, Jennifer S. Golia ;

Rhode, Kawal ;

Tobon-Gomez, Catalina ;

Vorontsov, Eugene ;

Meakin, James A. ;

Ourselin, Sebastien ;

Wiesenfarth, Manuel ;

Arbelaez, Pablo ;

Bae, Byeonguk ;

Chen, Sihong ;

Daza, Laura ;

Feng, Jianjiang ;

He, Baochun ;

Isensee, Fabian ;

Ji, Yuanfeng ;

Jia, Fucang ;

Kim, Ildoo ;

Maier-Hein, Klaus ;

Merhof, Dorit ;

Pai, Akshay ;

Park, Beomhee ;

Perslev, Mathias ;

Rezaiifar, Ramin ;

Rippel, Oliver ;

Sarasua, Ignacio ;

Shen, Wei ;

Son, Jaemin ;

Wachinger, Christian ;

Wang, Liansheng .

NATURE COMMUNICATIONS, 2022, 13 (01)

[4]

Awadalla A., 2023, Openflamingo

[5]

Bajwa Junaid, 2021, Future Healthc J, V8, pe188, DOI 10.7861/fhj.2021-0095

[6] Normative spino-pelvic sagittal alignment of Lebanese asymptomatic adults: Comparisons with different ethnicities [J].

Bakouny, Z. ;

Assi, A. ;

Yared, F. ;

Bizdikian, A. J. ;

Otayek, J. ;

Nacouzi, R. ;

Lafage, V. ;

Lafage, R. ;

Ghanem, I. ;

Kreichati, G. .

ORTHOPAEDICS & TRAUMATOLOGY-SURGERY & RESEARCH, 2018, 104 (05) :557-564

[7]

Ben Abacha A., 2019, P C LABS EV FOR CLEF

[8]

Ben Abacha A., 2021, P CLEF 2021 C LABS E

[9]

Bethesda M., 2006, Medpix™ receives patent

[10] Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts [J].

Changpinyo, Soravit ;

Sharma, Piyush ;

Ding, Nan ;

Soricut, Radu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3557-3567

← 1 2 3 4 5 6 7 8 →