Generative Bias for Robust Visual Question Answering

被引：25

作者：

Cho, Jae Won ^{[1
]}

Kim, Dong-Jin ^{[2
]}

Ryu, Hyeonggon ^{[1
]}

Kweon, In So ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Hanyang Univ, Seoul, South Korea

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01124

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.

引用

页码：11681 / 11690

页数：10

共 50 条

[21] Exploring and exploiting model uncertainty for robust visual question answering
Zhang, Xuesong
He, Jun
Zhao, Jia
Hu, Zhenzhen
Yang, Xun
Li, Jia
Hong, Richang
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[22] Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering
Liang, Zujie
Jiang, Weitao
Hu, Haifeng
Zhu, Jiaying
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3285 - 3292
[23] Visual Question Answering
Nada, Ahmed
Chen, Min
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
[24] Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering
Lu, Qiwen
Chen, Shengbo
Zhu, Xiaoke
JOURNAL OF IMAGING, 2024, 10 (03)
[25] VTQAGen: BART-based Generative Model For Visual Text Question Answering
Chen, Haoru
Wan, Tianjiao
Lin, Zhimin
Xu, Kele
Wang, Jin
Wang, Huaimin
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9456 - 9461
[26] Generative Attention Model with Adversarial Self-learning for Visual Question Answering
Ilievski, Ilija
Feng, Jiashi
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 415 - 423
[27] Generative retrieval for conversational question answering
Li, Yongqi
Yang, Nan
Wang, Liang
Wei, Furu
Li, Wenjie
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
[28] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[29] Be flexible! learn to debias by sampling and prompting for robust visual question answering
Liu, Jin
Fan, ChongFeng
Zhou, Fengyu
Xu, Huijuan
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[30] Reducing Multi-model Biases for Robust Visual Question Answering
Zhang F.
Li Y.
Li X.
Xu J.
Chen Y.
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 23 - 33

← 1 2 3 4 5 →