Exploring and exploiting model uncertainty for robust visual question answering

被引:0
|
作者
Zhang, Xuesong [1 ]
He, Jun [2 ]
Zhao, Jia [3 ]
Hu, Zhenzhen [1 ]
Yang, Xun [4 ]
Li, Jia [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Dataspace, Hefei, Peoples R China
[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Language bias; Uncertainty; Confidence; LANGUAGE;
D O I
10.1007/s00530-024-01560-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty
引用
收藏
页数:14
相关论文
共 50 条
  • [21] VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
    Bi, Yandong
    Jiang, Huajie
    Liu, Jing
    Liu, Mengting
    Hu, Yongli
    Yin, Baocai
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 264 - 277
  • [22] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [23] A Question-Centric Model for Visual Question Answering in Medical Imaging
    Vu, Minh H.
    Lofstedt, Tommy
    Nyholm, Tufve
    Sznitman, Raphael
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
  • [24] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [25] Be flexible! learn to debias by sampling and prompting for robust visual question answering
    Liu, Jin
    Fan, ChongFeng
    Zhou, Fengyu
    Xu, Huijuan
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [26] Robust visual question answering via semantic cross modal augmentation
    Mashrur, Akib
    Luo, Wei
    Zaidi, Nayyar A.
    Robles-Kelly, Antonio
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
  • [27] Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
    Qin, Hong
    An, Gaoyun
    Ruan, Qiuqi
    IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 145 - 158
  • [28] HCCL: Hierarchical Counterfactual Contrastive Learning for Robust Visual Question Answering
    Hao, Dongze
    Wang, Qunbo
    Zhu, Xinxin
    Liu, Jing
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
  • [29] Visual Question Answering With a Hybrid Convolution Recurrent Model
    Harzig, Philipp
    Eggert, Christian
    Lienhart, Rainer
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 318 - 325
  • [30] Robust data augmentation and contrast learning for debiased visual question answering
    Ning, Ke
    Li, Zhixin
    NEUROCOMPUTING, 2025, 626