Exploring and exploiting model uncertainty for robust visual question answering

被引:0
|
作者
Zhang, Xuesong [1 ]
He, Jun [2 ]
Zhao, Jia [3 ]
Hu, Zhenzhen [1 ]
Yang, Xun [4 ]
Li, Jia [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Dataspace, Hefei, Peoples R China
[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Language bias; Uncertainty; Confidence; LANGUAGE;
D O I
10.1007/s00530-024-01560-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty
引用
收藏
页数:14
相关论文
共 50 条
  • [31] ENVQA: Improving Visual Question Answering model by enriching the visual feature
    Chowdhury, Souvik
    Soni, Badal
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 142
  • [32] Achieving Human Parity on Visual Question Answering
    Yan, Ming
    Xu, Haiyang
    Li, Chenliang
    Tian, Junfeng
    Bi, Bin
    Wang, Wei
    Xu, Xianzhe
    Zhang, Ji
    Huang, Songfang
    Huang, Fei
    Si, Luo
    Jin, Rong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (03)
  • [33] Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
    Xu, Huijuan
    Saenko, Kate
    COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 451 - 466
  • [34] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [35] Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering
    Jiang, Jingjing
    Liu, Ziyi
    Zheng, Nanning
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 132 (1) : 185 - 207
  • [36] OPEN-ENDED VISUAL QUESTION ANSWERING MODEL FOR REMOTE SENSING IMAGES
    Alsaleh, Sara O.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Al Zuair, Mansour
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2848 - 2851
  • [37] Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base
    Chae, Jinyeong
    Kim, Jihie
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [38] Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
    Zheng, Yuhang
    Wang, Zhen
    Chen, Long
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1084 - 1088
  • [39] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [40] A Modular Neurosymbolic Approach for Visual Graph Question Answering
    Eiter, Thomas
    Ruiz, Nelson Higuera
    Oetsch, Johannes
    NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,