Exploring and exploiting model uncertainty for robust visual question answering

被引:0
作者
Zhang, Xuesong [1 ]
He, Jun [2 ]
Zhao, Jia [3 ]
Hu, Zhenzhen [1 ]
Yang, Xun [4 ]
Li, Jia [1 ]
Hong, Richang [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Dataspace, Hefei, Peoples R China
[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Language bias; Uncertainty; Confidence; LANGUAGE;
D O I
10.1007/s00530-024-01560-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Combining Multiple Cues for Visual Madlibs Question Answering
    Tatiana Tommasi
    Arun Mallya
    Bryan Plummer
    Svetlana Lazebnik
    Alexander C. Berg
    Tamara L. Berg
    International Journal of Computer Vision, 2019, 127 : 38 - 60
  • [42] Visual question answering: a state-of-the-art review
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) : 5705 - 5745
  • [43] Scene Graph Refinement Network for Visual Question Answering
    Qian, Tianwen
    Chen, Jingjing
    Chen, Shaoxiang
    Wu, Bo
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
  • [44] Combining Multiple Cues for Visual Madlibs Question Answering
    Tommasi, Tatiana
    Mallya, Arun
    Plummer, Bryan
    Lazebnik, Svetlana
    Berg, Alexander C.
    Berg, Tamara L.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (01) : 38 - 60
  • [45] Transformer Gate Attention Model: An Improved Attention Model for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [46] Question action relevance and editing for visual question answering
    Toor, Andeep S.
    Wechsler, Harry
    Nappi, Michele
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 2921 - 2935
  • [47] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [48] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [49] Question action relevance and editing for visual question answering
    Andeep S. Toor
    Harry Wechsler
    Michele Nappi
    Multimedia Tools and Applications, 2019, 78 : 2921 - 2935
  • [50] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549