Similarity contrastive estimation for image and video soft contrastive self-supervised learning

被引:2
|
作者
Denize, Julien [1 ,2 ]
Rabarisoa, Jaonary [1 ]
Orcesi, Astrid [1 ]
Herault, Romain [2 ]
机构
[1] Univ Paris Saclay, CEA, List, F-91120 Palaiseau, France
[2] Normandie Univ, INSA Rouen, LITIS, F-76801 St Etienne Du Rouvray, France
关键词
Deep learning; Self-supervised learning; Contrastive; Representation;
D O I
10.1007/s00138-023-01444-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive representation learning has proven to be an effective self-supervised learning method for images and videos. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive one that brings the positives closer and estimates a continuous distribution to push or pull negative instances based on their learned similarities. We validate empirically our approach on both image and video representation learning. We show that SCE performs competitively with the state of the art on the ImageNet linear evaluation protocol for fewer pretraining epochs and that it generalizes to several downstream image tasks. We also show that SCE reaches state-of-the-art results for pretraining video representation and that the learned representation can generalize to video downstream tasks. Source code is available here: https://github.com/juliendenize/eztorch.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] A comprehensive perspective of contrastive self-supervised learning
    Songcan Chen
    Chuanxing Geng
    Frontiers of Computer Science, 2021, 15
  • [22] Slimmable Networks for Contrastive Self-supervised Learning
    Zhao, Shuai
    Zhu, Linchao
    Wang, Xiaohan
    Yang, Yi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (03) : 1222 - 1237
  • [23] Contrastive Transformation for Self-supervised Correspondence Learning
    Wang, Ning
    Zhou, Wengang
    Li, Hougiang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10174 - 10182
  • [24] Self-Supervised Contrastive Learning for Singing Voices
    Yakura, Hiromu
    Watanabe, Kento
    Goto, Masataka
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1614 - 1623
  • [25] Stereo Depth Estimation via Self-supervised Contrastive Representation Learning
    Tukra, Samyakh
    Giannarou, Stamatia
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 604 - 614
  • [26] Contrastive Masked Autoencoders for Self-Supervised Video Hashing
    Wang, Yuting
    Wang, Jinpeng
    Chen, Bin
    Zeng, Ziyun
    Xia, Shu-Tao
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2733 - 2741
  • [27] Vicsgaze: a gaze estimation method using self-supervised contrastive learning
    Gu, De
    Lv, Minghao
    Liu, Jianchu
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [28] Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
    Qiu, Shuang
    Wang, Lingxiao
    Bai, Chenjia
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] Cut-in maneuver detection with self-supervised contrastive video representation learning
    Nalcakan, Yagiz
    Bastanlar, Yalin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (06) : 2915 - 2923
  • [30] Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation
    Wang, Lulu
    Xu, Zengmin
    Zhang, Xuelian
    Meng, Ruxing
    Lu, Tao
    Computer Engineering and Applications, 2024, 60 (18) : 158 - 166