Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

被引:0
|
作者
Ge, Chongjian [1 ]
Liang, Youwei [2 ]
Song, Yibing [2 ]
Jiao, Jianbo [3 ]
Wang, Jue [2 ]
Luo, Ping [1 ]
机构
[1] Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent AI Lab, Bellevue, WA 98004 USA
[3] Univ Oxford, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor. T-stream contains two transformers, two projectors, and a predictor. T-stream connects to CNN encoders and is in parallel to the remaining C-Stream. During training, we perform SSL in both streams simultaneously and use the T-stream output to supervise C-stream. The features from CNN encoders are modulated in T-stream for visual attention enhancement and become suitable for the SSL scenario. We use these modulated features to supervise C-stream for learning attentive CNN encoders. To this end, we revitalize CNN attention by using transformers as guidance. Experiments on several standard visual recognition benchmarks, including image classification, object detection, and semantic segmentation, show that the proposed CARE framework improves CNN encoder backbones to the state-of-the-art performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Can Semantic Labels Assist Self-Supervised Visual Representation Learning?
    Wei, Longhui
    Xie, Lingxi
    He, Jianzhong
    Zhang, Xiaopeng
    Tian, Qi
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2642 - 2650
  • [32] MULTI-AUGMENTATION FOR EFFICIENT SELF-SUPERVISED VISUAL REPRESENTATION LEARNING
    Tran, Van Nhiem
    Huang, Chi-En
    Liu, Shen-Hsuan
    Yang, Kai-Lin
    Ko, Timothy
    Li, Yung-Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [33] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
    Chen, Richard J.
    Chen, Chengkuan
    Li, Yicong
    Chen, Tiffany Y.
    Trister, Andrew D.
    Krishnan, Rahul G.
    Mahmood, Faisal
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
  • [34] Towards Pointsets Representation Learning via Self-Supervised Learning and Set Augmentation
    Arsomngern, Pattaramanee
    Long, Cheng
    Suwajanakorn, Supasorn
    Nutanong, Sarana
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 1201 - 1216
  • [35] Self-supervised Video Hashing via Bidirectional Transformers
    Li, Shuyan
    Li, Xiu
    Lu, Jiwen
    Zhou, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13544 - 13553
  • [36] Stereo Depth Estimation via Self-supervised Contrastive Representation Learning
    Tukra, Samyakh
    Giannarou, Stamatia
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 604 - 614
  • [37] Self-Supervised Facial Motion Representation Learning via Contrastive Subclips
    Sun, Zheng
    Torrie, Shad A.
    Sumsion, Andrew W.
    Lee, Dah-Jye
    ELECTRONICS, 2023, 12 (06)
  • [38] Self-Supervised Video Representation Learning via Latent Time Navigation
    Yang, Di
    Wang, Yaohui
    Kong, Quan
    Dantcheva, Antitza
    Garattoni, Lorenzo
    Francesca, Gianpiero
    Bremond, Francois
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3118 - 3126
  • [39] DocMAE: Document Image Rectification via Self-supervised Representation Learning
    Liu, Shaokai
    Feng, Hao
    Zhou, Wengang
    Li, Houqiang
    Liu, Cong
    Wu, Feng
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1613 - 1618
  • [40] METRICBERT: TEXT REPRESENTATION LEARNING VIA SELF-SUPERVISED TRIPLET TRAINING
    Malkiel, Itzik
    Ginzburg, Dvir
    Barkan, Oren
    Caciularu, Avi
    Weill, Yoni
    Koenigstein, Noam
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8142 - 8146