A Bi-level representation learning model for medical visual question answering

被引:7
|
作者
Li, Yong [1 ]
Long, Shaopei [1 ]
Yang, Zhenguo [2 ]
Weng, Heng [3 ]
Zeng, Kun [4 ]
Huang, Zhenhua [1 ]
Wang, Fu Lee [5 ]
Hao, Tianyong [1 ]
机构
[1] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
[2] Guangdong Univ Technol, Sch Comp Sci, Guangzhou, Peoples R China
[3] Guangzhou Univ Chinese Med, State Key Lab Dampness Syndrome Chinese Med, Affiliated Hosp 2, Guangzhou, Peoples R China
[4] Sun Yat sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[5] Hong Kong Metropolitan Univ, Sch Sci & Technol, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical visual question answering; Token-level reasoning; Sentence-level reasoning; Label-distribution-smooth margin loss;
D O I
10.1016/j.jbi.2022.104183
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Medical Visual Question Answering (VQA) targets at answering questions related to given medical images and it contains tremendous potential in healthcare services. However, researches on medical VQA are still facing challenges, particularly on how to learn a fine-grained multimodal semantic representation from relatively small volume of data resources for answer prediction. Moreover, the long-tailed distribution labels of medical VQA data frequently result in poor performance of models. To this end, we propose a novel bi-level representation learning model with two reasoning modules to learn bi-level representations for the medical VQA task. One is sentence-level reasoning to learn sentence-level semantic representations from multimodal input. The other is token-level reasoning that employs an attention mechanism to generate a multimodal contextual vector by fusing image features and word embeddings. The contextual vector is used to filter irrelevant semantic representations from sentence-level reasoning to generate a fine-grained multimodal representation. Furthermore, a label -distribution-smooth margin loss is proposed to minimize generalization error bound of long-tailed distribution datasets by modifying margin bound of different labels in training set. Based on standard VQA-Rad dataset and PathVQA dataset, the proposed model achieves 0.7605 and 0.5434 on accuracy, 0.7741 and 0.5288 on F1-score, respectively, outperforming a set of state-of-the-art baseline models.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Diffusion-based Visual Representation Learning for Medical Question Answering
    Bian, Dexin
    Wang, Xiaoru
    Li, Meifang
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [2] A Survey on Representation Learning in Visual Question Answering
    Sahani, Manish
    Singh, Priyadarshan
    Jangpangi, Sachin
    Kumar, Shailender
    MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
  • [3] BPI-MVQA: a bi-branch model for medical visual question answering
    Shengyan Liu
    Xuejie Zhang
    Xiaobing Zhou
    Jian Yang
    BMC Medical Imaging, 22
  • [4] BPI-MVQA: a bi-branch model for medical visual question answering
    Liu, Shengyan
    Zhang, Xuejie
    Zhou, Xiaobing
    Yang, Jian
    BMC MEDICAL IMAGING, 2022, 22 (01)
  • [5] MMQL: Multi-Question Learning for Medical Visual Question Answering
    Chen, Qishen
    Bian, Minjie
    Xu, Huahu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 480 - 489
  • [6] A Question-Centric Model for Visual Question Answering in Medical Imaging
    Vu, Minh H.
    Lofstedt, Tommy
    Nyholm, Tufve
    Sznitman, Raphael
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
  • [7] Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering
    Long, Shaopei
    Li, Yong
    Weng, Heng
    Tang, Buzhou
    Wang, Fu Lee
    Hao, Tianyong
    NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 162 - 177
  • [8] Provable Representation Learning for Imitation Learning via Bi-level Optimization
    Arora, Sanjeev
    Du, Simon S.
    Kakade, Sham
    Luo, Yuping
    Saunshi, Nikunj
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [9] Visual Question Answering with Question Representation Update (QRU)
    Li, Ruiyu
    Jia, Jiaya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [10] Adversarial Learning of Answer-Related Representation for Visual Question Answering
    Liu, Yun
    Zhang, Xiaoming
    Huang, Feiran
    Li, Zhoujun
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1013 - 1022