Robust data augmentation and contrast learning for debiased visual question answering

被引:0
作者
Ning, Ke [1 ,2 ]
Li, Zhixin [1 ,2 ]
机构
[1] Guangxi Normal Univ, Key Lab Educ Blockchain & Intelligent Technol, Minist Educ, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Language priors; Data augmentation; Knowledge distillation; Contrastive learning;
D O I
10.1016/j.neucom.2025.129527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The language prior problem in VQA causes the model to make predictions directly based on spurious correlations between questions and answers, causing the model's performance to drop sharply outside the distribution. Current debiasing methods often achieve good out-of-distribution generalization capabilities at the expense of significant in-distribution performance degradation, while non-debiasing methods sacrifice a large amount of out-of-distribution performance to achieve high in-distribution performance. We propose a novel method combining multi-teacher knowledge distillation and contrastive learning (MKDCL) to solve the language prior problem in VQA. We propose a Question Answer Selection (QAS) module to select reasonable questions for images, which also determines the pseudo answers with multi-teacher's weighted predictions. The Contrastive Learning Samples Generation (CLSG) module we propose synthesizes four types of positive and negative samples in visual and language modalities for contrastive learning, effectively increases the semantic dependency of the images while avoiding performance degradation due to spurious correlations between questions and answers. Our method is model-agnostic and achieves state-of-the-art performance (62.93%) on the language prior-sensitive VQA-CP v2 dataset while maintaining performance (65.43%) on the VQA v2 dataset.
引用
收藏
页数:11
相关论文
共 54 条
  • [51] Mining graph-based dynamic relationships for object detection
    Yang, Xiwei
    Li, Zhixin
    Zhong, Xinfang
    Zhang, Canlong
    Ma, Huifang
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [52] Stacked Attention Networks for Image Question Answering
    Yang, Zichao
    He, Xiaodong
    Gao, Jianfeng
    Deng, Li
    Smola, Alex
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 21 - 29
  • [53] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [54] Zhi X, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1083