Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

被引:0
|
作者
Javed, Syed Ashar [1 ]
Saxena, Shreyas
Gandhi, Vineet [2 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] IIIT Hyderabad, CVIT, Kohli Ctr Intelligent Syst KCIS, Hyderabad, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The intuition behind this idea is to encourage the model to localize to regions which can explain some semantic property in the data, in our case, the property being the presence of a concept in a set of images We present thorough quantitative and qualitative experiments to demonstrate the efficacy of our approach and show a 5.6% improvement over the current state of the art on Visual Genome dataset, a 5.8% improvement on the ReferItGame dataset and comparable to state-of-art performance on the Flickr30k dataset.
引用
收藏
页码:796 / 802
页数:7
相关论文
共 50 条
  • [21] Learning to Remove Rain in Video With Self-Supervision
    Yang, Wenhan
    Tan, Robby T.
    Wang, Shiqi
    Kot, Alex C.
    Liu, Jiaying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1378 - 1396
  • [22] Prototype Augmentation and Self-Supervision for Incremental Learning
    Zhu, Fei
    Zhang, Xu-Yao
    Wang, Chuang
    Yin, Fei
    Liu, Cheng-Lin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5867 - 5876
  • [23] Fine-Grained Self-Supervision for Generalizable Semantic Segmentation
    Zhang, Yuhang
    Tian, Shishun
    Liao, Muxin
    Zhang, Zhengyu
    Zou, Wenbin
    Xu, Chen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 371 - 383
  • [24] THE FEASIBILITY OF SELF-SUPERVISION
    Hudelson, Earl
    JOURNAL OF EDUCATIONAL RESEARCH, 1952, 45 (05): : 335 - 347
  • [25] End-to-end novel visual categories learning via auxiliary self-supervision
    Qing, Yuanyuan
    Zeng, Yijie
    Cao, Qi
    Huang, Guang-Bin
    NEURAL NETWORKS, 2021, 139 : 24 - 32
  • [26] Unsupervised 3D Pose Estimation with Geometric Self-Supervision
    Chen, Ching-Hang
    Tyagi, Ambrish
    Agrawal, Amit
    Drover, Dylan
    Rohith, M., V
    Stojanov, Stefan
    Rehg, James M.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5707 - 5717
  • [27] Self-distillation and self-supervision for partial label learning
    Yu, Xiaotong
    Sun, Shiding
    Tian, Yingjie
    PATTERN RECOGNITION, 2024, 146
  • [28] Self-supervision & meta-learning for one-shot unsupervised cross-domain detection
    Borlino, Francesco Cappio
    Polizzotto, Salvatore
    Caputo, Barbara
    Tommasi, Tatiana
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 223
  • [29] A Novel Visual Attribute Disentanglement Approach using Self-Supervision
    Aktas, Abdurrahman Akin
    Keles, Hacer Yalim
    Askerzade, Iman
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [30] CoLES: Contrastive Learning for Event Sequences with Self-Supervision
    Babaev, Dmitrii
    Ovsov, Nikita
    Kireev, Ivan
    Ivanova, Maria
    Gusev, Gleb
    Nazarov, Ivan
    Tuzhilin, Alexander
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1190 - 1199