Dense Semantic Contrast for Self-Supervised Visual Representation Learning

被引:21
|
作者
Li, Xiaoni [1 ,2 ]
Zhou, Yu [1 ,2 ]
Zhang, Yifei [1 ,2 ]
Zhang, Aoting [1 ]
Wang, Wei [1 ,2 ]
Jiang, Ning [3 ]
Wu, Haiying [3 ]
Wang, Weiping [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-Supervised Learning; Representation Learning; Contrastive; Learning; Dense Representation; Semantics Discovery;
D O I
10.1145/3474085.3475551
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised representation learning for visual pre-training has achieved remarkable success with sample (instance or pixel) discrimination and semantics discovery of instance, whereas there still exists a non-negligible gap between pre-trained model and downstream dense prediction tasks. Concretely, these downstream tasks require more accurate representation, in other words, the pixels from the same object must belong to a shared semantic category, which is lacking in the previous methods. In this work, we present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level to meet the requirement of these tasks. Furthermore, we propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning. Specially, we explicitly explore the semantic structure of the dataset by mining relations among pixels from different perspectives. For intra-image relation modeling, we discover pixel neighbors from multiple views. And for inter-image relations, we enforce pixel representation from the same semantic class to be more similar than the representation from different classes in one mini-batch. Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks, including object detection, semantic segmentation, and instance segmentation. Code will be made available.
引用
收藏
页码:1368 / 1376
页数:9
相关论文
共 50 条
  • [21] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [22] Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning
    Tellamekala, Mani Kumar
    Valstar, Michel
    Pound, Michael
    Giesbrecht, Timo
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9912 - 9919
  • [23] Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning
    Sun, Jinghan
    Wei, Dong
    Ma, Kai
    Wang, Liansheng
    Zheng, Yefeng
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2307 - 2315
  • [24] Self-supervised Learning for Semantic Sentence Matching with Dense Transformer Inference Network
    Yu, Fengying
    Wang, Jianzong
    Tao, Dewei
    Cheng, Ning
    Xiao, Jing
    WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 258 - 272
  • [25] Comparing Learning Methodologies for Self-Supervised Audio-Visual Representation Learning
    Terbouche, Hacene
    Schoneveld, Liam
    Benson, Oisin
    Othmani, Alice
    IEEE ACCESS, 2022, 10 : 41622 - 41638
  • [26] Whitening for Self-Supervised Representation Learning
    Ermolov, Aleksandr
    Siarohin, Aliaksandr
    Sangineto, Enver
    Sebe, Nicu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [27] Self-Supervised Representation Learning for CAD
    Jones, Benjamin T.
    Hu, Michael
    Kodnongbua, Milin
    Kim, Vladimir G.
    Schulz, Adriana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21327 - 21336
  • [28] Enhancing motion visual cues for self-supervised video representation learning
    Nie, Mu
    Quan, Zhibin
    Ding, Weiping
    Yang, Wankou
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [29] Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects
    Zhang, Baowen
    Li, Jiahe
    Deng, Xiaoming
    Zhang, Yinda
    Ma, Cuixia
    Wang, Hongan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14222 - 14232
  • [30] MULTI-AUGMENTATION FOR EFFICIENT SELF-SUPERVISED VISUAL REPRESENTATION LEARNING
    Tran, Van Nhiem
    Huang, Chi-En
    Liu, Shen-Hsuan
    Yang, Kai-Lin
    Ko, Timothy
    Li, Yung-Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,