Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

被引:0
|
作者
Ge, Chongjian [1 ]
Liang, Youwei [2 ]
Song, Yibing [2 ]
Jiao, Jianbo [3 ]
Wang, Jue [2 ]
Luo, Ping [1 ]
机构
[1] Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent AI Lab, Bellevue, WA 98004 USA
[3] Univ Oxford, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor. T-stream contains two transformers, two projectors, and a predictor. T-stream connects to CNN encoders and is in parallel to the remaining C-Stream. During training, we perform SSL in both streams simultaneously and use the T-stream output to supervise C-stream. The features from CNN encoders are modulated in T-stream for visual attention enhancement and become suitable for the SSL scenario. We use these modulated features to supervise C-stream for learning attentive CNN encoders. To this end, we revitalize CNN attention by using transformers as guidance. Experiments on several standard visual recognition benchmarks, including image classification, object detection, and semantic segmentation, show that the proposed CARE framework improves CNN encoder backbones to the state-of-the-art performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Self-Supervised Representation Learning via Neighborhood-Relational Encoding
    Sabokrou, Mohammad
    Khalooei, Mohammad
    Adeli, Ehsan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8009 - 8018
  • [42] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
    Tresson, Paul
    Dumont, Maxime
    Jaeger, Marc
    Borne, Frederic
    Boivin, Stephane
    Marie-Louise, Loic
    Francois, Jeremie
    Boukcim, Hassan
    Goeau, Herve
    GEODERMA, 2024, 450
  • [43] Self-Distilled Self-supervised Representation Learning
    Jang, Jiho
    Kim, Seonhoon
    Yoo, Kiyoon
    Kong, Chaerin
    Kim, Jangho
    Kwak, Nojun
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2828 - 2838
  • [44] Towards Latent Masked Image Modeling for Self-supervised Visual Representation Learning
    Wei, Yibing
    Gupta, Abhinav
    Morgado, Pedro
    COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 1 - 17
  • [45] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
    Turrisi da Costa, Victor G.
    Fini, Enrico
    Nabi, Moin
    Sebe, Nicu
    Ricci, Elisa
    Journal of Machine Learning Research, 2022, 23
  • [46] Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning
    Song, Kaiyou
    Zhang, Shan
    Luo, Zimeng
    Wang, Tong
    Xie, Jin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16053 - 16062
  • [47] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
    Turrisi da Costa, Victor G.
    Fini, Enrico
    Nabi, Moin
    Sebe, Nicu
    Ricci, Elisa
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 6
  • [48] Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos
    Feng, Zishun
    Tu, Ming
    Xia, Rui
    Wang, Yuxuan
    Krishnamurthy, Ashok
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5671 - 5672
  • [49] Feature selection and cascade dimensionality reduction for self-supervised visual representation learning
    Qu, Peixin
    Jin, Songlin
    Tian, Yongqin
    Zhou, Ling
    Zheng, Ying
    Zhang, Weidong
    Xu, Yibo
    Pan, Xipeng
    Zhao, Wenyi
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
  • [50] Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology
    Boyd, Joseph
    Liashuha, Mykola
    Deutsch, Eric
    Paragios, Nikos
    Christodoulidis, Stergios
    Vakalopoulou, Maria
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 639 - 647