LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering

被引：21

作者：

Liang, Zujie ^{[1
]}

Hu, Haifeng ^{[1
]}

Zhu, Jiaying ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China

来源：

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年

关键词：

Visual Question Answering; Unbiased Learning; Language Prior;

D O I：

10.1145/3404835.3462981

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most existing Visual Question Answering (VQA) systems tend to overly rely on the language bias and hence fail to reason from the visual clue. To address this issue, we propose a novel Language-Prior Feedback (LPF) objective function, to re-balance the proportion of each answer's loss value in the total VQA loss. The LPF firstly calculates a modulating factor to determine the language bias using a question-only branch. Then, the LPF assigns a self-adaptive weight to each training sample in the training process. With this reweighting mechanism, the LPF ensures that the total VQA loss can be reshaped to a more balanced form. By this means, the samples that require certain visual information to predict will be efficiently used during training. Our method is simple to implement, model-agnostic, and end-to-end trainable. We conduct extensive experiments and the results show that the LPF (1) brings a significant improvement over various VQA models, (2) achieves competitive performance on the bias-sensitive VQA-CP v2 benchmark.

引用

页码：1955 / 1959

页数：5

共 5 条

[1] Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Guo, Yangyang
Cheng, Zhiyong
Nie, Liqiang
Liu, Yibing
Wang, Yinglong
Kankanhalli, Mohan
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 75 - 84
[2] Zero-shot Visual Question Answering with Language Model Feedback
Du, Yifan
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Wen, Ji-Rong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9268 - 9281
[3] Handling language prior and compositional reasoning issues in Visual Question Answering system
Chowdhury, Souvik
Soni, Badal
NEUROCOMPUTING, 2025, 635
[4] ZVQAF: Zero-shot visual question answering with feedback from large language models
Liu, Cheng
Wang, Chao
Peng, Yan
Li, Zhixu
NEUROCOMPUTING, 2024, 580
[5] Learning the Meanings of Function Words From Grounded Language Using a Visual Question Answering Model
Portelance, Eva
Frank, Michael C.
Jurafsky, Dan
COGNITIVE SCIENCE, 2024, 48 (05)

← 1 →