Exploring and exploiting model uncertainty for robust visual question answering

被引:0
|
作者
Zhang, Xuesong [1 ]
He, Jun [2 ]
Zhao, Jia [3 ]
Hu, Zhenzhen [1 ]
Yang, Xun [4 ]
Li, Jia [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Dataspace, Hefei, Peoples R China
[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Language bias; Uncertainty; Confidence; LANGUAGE;
D O I
10.1007/s00530-024-01560-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty
引用
收藏
页数:14
相关论文
共 50 条
  • [1] On the role of question encoder sequence model in robust visual question answering
    Kv, Gouthaman
    Mittal, Anurag
    PATTERN RECOGNITION, 2022, 131
  • [2] Exploiting hierarchical visual features for visual question answering
    Hong, Jongkwang
    Fu, Jianlong
    Uh, Youngjung
    Mei, Tao
    Byun, Hyeran
    NEUROCOMPUTING, 2019, 351 : 187 - 195
  • [3] Reducing Multi-model Biases for Robust Visual Question Answering
    Zhang F.
    Li Y.
    Li X.
    Xu J.
    Chen Y.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 23 - 33
  • [4] R-VQA: A robust visual question answering model
    Chowdhury, Souvik
    Soni, Badal
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [5] Fair Attention Network for Robust Visual Question Answering
    Bi, Yandong
    Jiang, Huajie
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7870 - 7881
  • [6] Robust visual question answering via polarity enhancement and contrast *
    Peng, Dahe
    Li, Zhixin
    NEURAL NETWORKS, 2024, 179
  • [7] CONTEXT RELATION FUSION MODEL FOR VISUAL QUESTION ANSWERING
    Zhang, Haotian
    Wu, Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2112 - 2116
  • [8] SimVQA: Exploring Simulated Environments for Visual Question Answering
    Cascante-Bonilla, Paola
    Wu, Hui
    Wang, Letao
    Feris, Rogerio
    Ordonez, Vicente
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5046 - 5056
  • [9] Bias-guided margin loss for robust Visual Question Answering
    Sun, Yanhan
    Qi, Jiangtao
    Zhu, Zhenfang
    Li, Kefeng
    Zhao, Liang
    Lv, Lei
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)
  • [10] Cycle-Consistency for Robust Visual Question Answering
    Shah, Meet
    Chen, Xinlei
    Rohrbach, Marcus
    Parikh, Devi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651