Exploring and exploiting model uncertainty for robust visual question answering

被引：0

作者：

Zhang, Xuesong ^{[1
]}

He, Jun ^{[2
]}

Zhao, Jia ^{[3
]}

Hu, Zhenzhen ^{[1
]}

Yang, Xun ^{[4
]}

Li, Jia ^{[1
]}

Hong, Richang ^{[1
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Dataspace, Hefei, Peoples R China

[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang, Peoples R China

[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Visual question answering; Language bias; Uncertainty; Confidence; LANGUAGE;

D O I：

10.1007/s00530-024-01560-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty

引用

页数：14

共 50 条

[1] On the role of question encoder sequence model in robust visual question answering
Kv, Gouthaman
Mittal, Anurag
PATTERN RECOGNITION, 2022, 131
[2] Exploiting hierarchical visual features for visual question answering
Hong, Jongkwang
Fu, Jianlong
Uh, Youngjung
Mei, Tao
Byun, Hyeran
NEUROCOMPUTING, 2019, 351 : 187 - 195
[3] Reducing Multi-model Biases for Robust Visual Question Answering
Zhang F.
Li Y.
Li X.
Xu J.
Chen Y.
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 23 - 33
[4] R-VQA: A robust visual question answering model
Chowdhury, Souvik
Soni, Badal
KNOWLEDGE-BASED SYSTEMS, 2025, 309
[5] Fair Attention Network for Robust Visual Question Answering
Bi, Yandong
Jiang, Huajie
Hu, Yongli
Sun, Yanfeng
Yin, Baocai
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7870 - 7881
[6] Robust visual question answering via polarity enhancement and contrast *
Peng, Dahe
Li, Zhixin
NEURAL NETWORKS, 2024, 179
[7] CONTEXT RELATION FUSION MODEL FOR VISUAL QUESTION ANSWERING
Zhang, Haotian
Wu, Wei
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2112 - 2116
[8] SimVQA: Exploring Simulated Environments for Visual Question Answering
Cascante-Bonilla, Paola
Wu, Hui
Wang, Letao
Feris, Rogerio
Ordonez, Vicente
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5046 - 5056
[9] Bias-guided margin loss for robust Visual Question Answering
Sun, Yanhan
Qi, Jiangtao
Zhu, Zhenfang
Li, Kefeng
Zhao, Liang
Lv, Lei
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)
[10] Cycle-Consistency for Robust Visual Question Answering
Shah, Meet
Chen, Xinlei
Rohrbach, Marcus
Parikh, Devi
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651

← 1 2 3 4 5 →