Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

被引:0
|
作者
Zhi, Xi [1 ,2 ]
Mao, Zhendong [3 ]
Liu, Chunxiao [1 ,2 ]
Zhang, Peng [1 ]
Wang, Bin [4 ]
Zhang, Yongdong [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Univ Sci & Technol China, Hefei, Peoples R China
[4] Xiaomi Inc, Xiaomi AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency and strengthen image dependency. However, they are still subject to the language prior problem since the data biases have not been fundamentally addressed. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and then propose a self-supervised auxiliary task to utilize the balanced data to assist the VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method achieves state-of-the-art performance, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations. Our code is available on GitHub(1).
引用
收藏
页码:1083 / 1089
页数:7
相关论文
共 50 条
  • [21] A multi-scale self-supervised hypergraph contrastive learning framework for video question answering
    Wang, Zheng
    Wu, Bin
    Ota, Kaoru
    Dong, Mianxiong
    Li, He
    NEURAL NETWORKS, 2023, 168 : 272 - 286
  • [22] Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
    Agrawal, Aishwarya
    Batra, Dhruv
    Parikh, Devi
    Kembhavi, Aniruddha
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4971 - 4980
  • [23] Few-Shot Learning with Self-supervised Classifier for Complex Knowledge Base Question Answering
    Liu, Bo
    Liu, Lei
    Wang, Peiyi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 104 - 116
  • [24] StableNet: Distinguishing the hard samples to overcome language priors in visual question answering
    Yu, Zhengtao
    Zhao, Jia
    Guo, Chenliang
    Yang, Ying
    IET COMPUTER VISION, 2024, 18 (02) : 315 - 327
  • [25] Guiding Visual Question Answering with Attention Priors
    Le, Thao Minh
    Le, Vuong
    Gupta, Sunil
    Venkatesh, Svetha
    Tran, Truyen
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379
  • [26] ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering
    Liu, Yun
    Zhang, Xiaoming
    Zhao, Zhiyun
    Zhang, Bo
    Cheng, Lei
    Li, Zhoujun
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4520 - 4533
  • [27] Self-Supervised Dense Visual Representation Learning
    Ozcelik, Timoteos Onur
    Gokberk, Berk
    Akarun, Lale
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [28] Self-supervised Learning of Visual Graph Matching
    Liu, Chang
    Zhang, Shaofeng
    Yang, Xiaokang
    Yan, Junchi
    COMPUTER VISION, ECCV 2022, PT XXIII, 2022, 13683 : 370 - 388
  • [29] Revisiting Self-Supervised Visual Representation Learning
    Kolesnikov, Alexander
    Zhai, Xiaohua
    Beyer, Lucas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1920 - 1929
  • [30] SESAME - self-supervised framework for extractive question answering over document collections
    Batista, Vitor A.
    Gomes, Diogo S. M.
    Evsukoff, Alexandre
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, : 1725 - 1747