LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering

被引:21
|
作者
Liang, Zujie [1 ]
Hu, Haifeng [1 ]
Zhu, Jiaying [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
关键词
Visual Question Answering; Unbiased Learning; Language Prior;
D O I
10.1145/3404835.3462981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most existing Visual Question Answering (VQA) systems tend to overly rely on the language bias and hence fail to reason from the visual clue. To address this issue, we propose a novel Language-Prior Feedback (LPF) objective function, to re-balance the proportion of each answer's loss value in the total VQA loss. The LPF firstly calculates a modulating factor to determine the language bias using a question-only branch. Then, the LPF assigns a self-adaptive weight to each training sample in the training process. With this reweighting mechanism, the LPF ensures that the total VQA loss can be reshaped to a more balanced form. By this means, the samples that require certain visual information to predict will be efficiently used during training. Our method is simple to implement, model-agnostic, and end-to-end trainable. We conduct extensive experiments and the results show that the LPF (1) brings a significant improvement over various VQA models, (2) achieves competitive performance on the bias-sensitive VQA-CP v2 benchmark.
引用
收藏
页码:1955 / 1959
页数:5
相关论文
共 5 条
  • [1] Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
    Guo, Yangyang
    Cheng, Zhiyong
    Nie, Liqiang
    Liu, Yibing
    Wang, Yinglong
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 75 - 84
  • [2] Zero-shot Visual Question Answering with Language Model Feedback
    Du, Yifan
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9268 - 9281
  • [3] Handling language prior and compositional reasoning issues in Visual Question Answering system
    Chowdhury, Souvik
    Soni, Badal
    NEUROCOMPUTING, 2025, 635
  • [4] ZVQAF: Zero-shot visual question answering with feedback from large language models
    Liu, Cheng
    Wang, Chao
    Peng, Yan
    Li, Zhixu
    NEUROCOMPUTING, 2024, 580
  • [5] Learning the Meanings of Function Words From Grounded Language Using a Visual Question Answering Model
    Portelance, Eva
    Frank, Michael C.
    Jurafsky, Dan
    COGNITIVE SCIENCE, 2024, 48 (05)