Exploring and exploiting model uncertainty for robust visual question answering

被引：0

作者：

Zhang, Xuesong ^{[1
]}

He, Jun ^{[2
]}

Zhao, Jia ^{[3
]}

Hu, Zhenzhen ^{[1
]}

Yang, Xun ^{[4
]}

Li, Jia ^{[1
]}

Hong, Richang ^{[1
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

[2] Hefei Comprehens Natl Sci Ctr, Inst Dataspace, Hefei, Peoples R China

[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang, Peoples R China

[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Visual question answering; Language bias; Uncertainty; Confidence; LANGUAGE;

D O I：

10.1007/s00530-024-01560-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty

引用

页数：14

共 50 条

[21] VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
Bi, Yandong
Jiang, Huajie
Liu, Jing
Liu, Mengting
Hu, Yongli
Yin, Baocai
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 264 - 277
[22] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[23] A Question-Centric Model for Visual Question Answering in Medical Imaging
Vu, Minh H.
Lofstedt, Tommy
Nyholm, Tufve
Sznitman, Raphael
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
[24] A visual question answering model based on image captioning
Zhou, Kun
Liu, Qiongjie
Zhao, Dexin
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[25] Be flexible! learn to debias by sampling and prompting for robust visual question answering
Liu, Jin
Fan, ChongFeng
Zhou, Fengyu
Xu, Huijuan
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[26] Robust visual question answering via semantic cross modal augmentation
Mashrur, Akib
Luo, Wei
Zaidi, Nayyar A.
Robles-Kelly, Antonio
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
[27] Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
Qin, Hong
An, Gaoyun
Ruan, Qiuqi
IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 145 - 158
[28] HCCL: Hierarchical Counterfactual Contrastive Learning for Robust Visual Question Answering
Hao, Dongze
Wang, Qunbo
Zhu, Xinxin
Liu, Jing
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
[29] Visual Question Answering With a Hybrid Convolution Recurrent Model
Harzig, Philipp
Eggert, Christian
Lienhart, Rainer
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 318 - 325
[30] Robust data augmentation and contrast learning for debiased visual question answering
Ning, Ke
Li, Zhixin
NEUROCOMPUTING, 2025, 626

← 1 2 3 4 5 →