A Bi-level representation learning model for medical visual question answering

被引：7

作者：

Li, Yong ^{[1
]}

Long, Shaopei ^{[1
]}

Yang, Zhenguo ^{[2
]}

Weng, Heng ^{[3
]}

Zeng, Kun ^{[4
]}

Huang, Zhenhua ^{[1
]}

Wang, Fu Lee ^{[5
]}

Hao, Tianyong ^{[1
]}

机构：

[1] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China

[2] Guangdong Univ Technol, Sch Comp Sci, Guangzhou, Peoples R China

[3] Guangzhou Univ Chinese Med, State Key Lab Dampness Syndrome Chinese Med, Affiliated Hosp 2, Guangzhou, Peoples R China

[4] Sun Yat sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China

[5] Hong Kong Metropolitan Univ, Sch Sci & Technol, Hong Kong, Peoples R China

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2022年 / 134卷

基金：

中国国家自然科学基金;

关键词：

Medical visual question answering; Token-level reasoning; Sentence-level reasoning; Label-distribution-smooth margin loss;

D O I：

10.1016/j.jbi.2022.104183

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Medical Visual Question Answering (VQA) targets at answering questions related to given medical images and it contains tremendous potential in healthcare services. However, researches on medical VQA are still facing challenges, particularly on how to learn a fine-grained multimodal semantic representation from relatively small volume of data resources for answer prediction. Moreover, the long-tailed distribution labels of medical VQA data frequently result in poor performance of models. To this end, we propose a novel bi-level representation learning model with two reasoning modules to learn bi-level representations for the medical VQA task. One is sentence-level reasoning to learn sentence-level semantic representations from multimodal input. The other is token-level reasoning that employs an attention mechanism to generate a multimodal contextual vector by fusing image features and word embeddings. The contextual vector is used to filter irrelevant semantic representations from sentence-level reasoning to generate a fine-grained multimodal representation. Furthermore, a label -distribution-smooth margin loss is proposed to minimize generalization error bound of long-tailed distribution datasets by modifying margin bound of different labels in training set. Based on standard VQA-Rad dataset and PathVQA dataset, the proposed model achieves 0.7605 and 0.5434 on accuracy, 0.7741 and 0.5288 on F1-score, respectively, outperforming a set of state-of-the-art baseline models.

引用

页数：12

共 50 条

[1] Diffusion-based Visual Representation Learning for Medical Question Answering
Bian, Dexin
Wang, Xiaoru
Li, Meifang
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
[2] A Survey on Representation Learning in Visual Question Answering
Sahani, Manish
Singh, Priyadarshan
Jangpangi, Sachin
Kumar, Shailender
MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
[3] BPI-MVQA: a bi-branch model for medical visual question answering
Shengyan Liu
Xuejie Zhang
Xiaobing Zhou
Jian Yang
BMC Medical Imaging, 22
[4] BPI-MVQA: a bi-branch model for medical visual question answering
Liu, Shengyan
Zhang, Xuejie
Zhou, Xiaobing
Yang, Jian
BMC MEDICAL IMAGING, 2022, 22 (01)
[5] MMQL: Multi-Question Learning for Medical Visual Question Answering
Chen, Qishen
Bian, Minjie
Xu, Huahu
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 480 - 489
[6] A Question-Centric Model for Visual Question Answering in Medical Imaging
Vu, Minh H.
Lofstedt, Tommy
Nyholm, Tufve
Sznitman, Raphael
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
[7] Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering
Long, Shaopei
Li, Yong
Weng, Heng
Tang, Buzhou
Wang, Fu Lee
Hao, Tianyong
NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 162 - 177
[8] Provable Representation Learning for Imitation Learning via Bi-level Optimization
Arora, Sanjeev
Du, Simon S.
Kakade, Sham
Luo, Yuping
Saunshi, Nikunj
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[9] Visual Question Answering with Question Representation Update (QRU)
Li, Ruiyu
Jia, Jiaya
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[10] Adversarial Learning of Answer-Related Representation for Visual Question Answering
Liu, Yun
Zhang, Xiaoming
Huang, Feiran
Li, Zhoujun
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1013 - 1022

← 1 2 3 4 5 →