Generative Bias for Robust Visual Question Answering

被引：25

作者：

Cho, Jae Won ^{[1
]}

Kim, Dong-Jin ^{[2
]}

Ryu, Hyeonggon ^{[1
]}

Kweon, In So ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Hanyang Univ, Seoul, South Korea

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01124

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.

引用

页码：11681 / 11690

页数：10

共 50 条

[41] WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering [J].

Chen, Pingyi ;

Zhu, Chenglu ;

Zheng, Sunyi ;

Li, Honglin ;

Yang, Lin .

COMPUTER VISION - ECCV 2024, PT XXXVI, 2025, 15094 :401-417

[42] VQA: Visual Question Answering [J].

Antol, Stanislaw ;

Agrawal, Aishwarya ;

Lu, Jiasen ;

Mitchell, Margaret ;

Batra, Dhruv ;

Zitnick, C. Lawrence ;

Parikh, Devi .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433

[43] Indic Visual Question Answering [J].

Chandrasekar, Aditya ;

Shimpi, Amey ;

Naik, Dinesh .

2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,

[44] VQA: Visual Question Answering [J].

Agrawal, Aishwarya ;

Lu, Jiasen ;

Antol, Stanislaw ;

Mitchell, Margaret ;

Zitnick, C. Lawrence ;

Parikh, Devi ;

Batra, Dhruv .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :4-31

[45] Survey on Visual Question Answering [J].

Bao X.-G. ;

Zhou C.-L. ;

Xiao K.-J. ;

Qin B. .

Ruan Jian Xue Bao/Journal of Software, 2021, 32 (08) :2522-2544

[46] Visual Question Answering A tutorial [J].

Teney, Damien ;

Wu, Qi ;

van den Hengel, Anton .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :63-75

[47] Multitask learning for neural generative question answering [J].

Yanzhou Huang ;

Tao Zhong .

Machine Vision and Applications, 2018, 29 :1009-1017

[48] Visual Question Generation as Dual Task of Visual Question Answering [J].

Li, Yikang ;

Duan, Nan ;

Zhou, Bolei ;

Chu, Xiao ;

Ouyang, Wanli ;

Wang, Xiaogang ;

Zhou, Ming .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6116-6124

[49] Multitask learning for neural generative question answering [J].

Huang, Yanzhou ;

Zhong, Tao .

MACHINE VISION AND APPLICATIONS, 2018, 29 (06) :1009-1017

[50] Retrieving Supporting Evidence for Generative Question Answering [J].

Huo, Siqing ;

Arabzadeh, Negar ;

Clarke, Charles L. A. .

ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL IN THE ASIA PACIFIC REGION, SIGIR-AP 2023, 2023, :11-20

← 1 2 3 4 5 →