Generative Bias for Robust Visual Question Answering

被引:25
|
作者
Cho, Jae Won [1 ]
Kim, Dong-Jin [2 ]
Ryu, Hyeonggon [1 ]
Kweon, In So [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Hanyang Univ, Seoul, South Korea
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.01124
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.
引用
收藏
页码:11681 / 11690
页数:10
相关论文
共 50 条
  • [21] Exploring and exploiting model uncertainty for robust visual question answering
    Zhang, Xuesong
    He, Jun
    Zhao, Jia
    Hu, Zhenzhen
    Yang, Xun
    Li, Jia
    Hong, Richang
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [22] Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering
    Liang, Zujie
    Jiang, Weitao
    Hu, Haifeng
    Zhu, Jiaying
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3285 - 3292
  • [23] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [24] Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering
    Lu, Qiwen
    Chen, Shengbo
    Zhu, Xiaoke
    JOURNAL OF IMAGING, 2024, 10 (03)
  • [25] VTQAGen: BART-based Generative Model For Visual Text Question Answering
    Chen, Haoru
    Wan, Tianjiao
    Lin, Zhimin
    Xu, Kele
    Wang, Jin
    Wang, Huaimin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9456 - 9461
  • [26] Generative Attention Model with Adversarial Self-learning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 415 - 423
  • [27] Generative retrieval for conversational question answering
    Li, Yongqi
    Yang, Nan
    Wang, Liang
    Wei, Furu
    Li, Wenjie
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
  • [28] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [29] Be flexible! learn to debias by sampling and prompting for robust visual question answering
    Liu, Jin
    Fan, ChongFeng
    Zhou, Fengyu
    Xu, Huijuan
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [30] Reducing Multi-model Biases for Robust Visual Question Answering
    Zhang F.
    Li Y.
    Li X.
    Xu J.
    Chen Y.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 23 - 33