Generative Bias for Robust Visual Question Answering

被引：25

作者：

Cho, Jae Won ^{[1
]}

Kim, Dong-Jin ^{[2
]}

Ryu, Hyeonggon ^{[1
]}

Kweon, In So ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Hanyang Univ, Seoul, South Korea

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01124

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.

引用

页码：11681 / 11690

页数：10

共 50 条

[31] HCCL: Hierarchical Counterfactual Contrastive Learning for Robust Visual Question Answering
Hao, Dongze
Wang, Qunbo
Zhu, Xinxin
Liu, Jing
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
[32] Robust visual question answering via semantic cross modal augmentation
Mashrur, Akib
Luo, Wei
Zaidi, Nayyar A.
Robles-Kelly, Antonio
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
[33] Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
Qin, Hong
An, Gaoyun
Ruan, Qiuqi
IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 145 - 158
[34] Robust data augmentation and contrast learning for debiased visual question answering
Ning, Ke
Li, Zhixin
NEUROCOMPUTING, 2025, 626
[35] GViG: Generative Visual Grounding Using Prompt-Based Language Modeling for Visual Question Answering
Li, Yi-Ting
Lin, Ying-Jia
Yeh, Chia-Jen
Lin, Chun-Yi
Kao, Hung-Yu
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT VI, PAKDD 2024, 2024, 14650 : 83 - 94
[36] Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Wang, Zhecan
Chen, Long
You, Haoxuan
Xu, Keyang
He, Yicheng
Li, Wenhao
Codella, Noel
Chang, Kai-Wei
Chang, Shih-Fu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8598 - 8617
[37] A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering
Ye, Shuchang
Naseem, Usman
Meng, Mingyuan
Feng, Dagan
Kim, Jinman
PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 13 - 17
[38] Multi-stage Reasoning on Introspecting and Revising Bias for Visual Question Answering
L., An-An
Lu, Zimu
Xu, Ning
Liu, Min
Yan, Chenggang
Zheng, Bolun
Lv, Bo
Duan, Yulong
Shao, Zhuang
Xuanya, Li
ACM TRANSACTIONS ON THE WEB, 2024, 18 (04)
[39] GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jiang, Jingjing
Liu, Ziyi
Liu, Yifan
Nan, Zhixiong
Zheng, Nanning
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 199 - 208
[40] LiGT: layout-infused generative transformer for visual question answering on Vietnamese receipts
Le, Thanh-Phong
Phan, Trung Le Chi
Nguyen, Nghia Hieu
Van Nguyen, Kiet
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025,

← 1 2 3 4 5 →