Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引:0
|
作者
Hu, Hao [1 ]
Baldassarre, Federico [1 ]
Azizpour, Hossein [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
关键词
Vision transformer; Transfer learning; Computer vision;
D O I
10.1007/978-3-031-26409-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.
引用
收藏
页码:409 / 426
页数:18
相关论文
共 50 条
  • [31] Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
    Prabhu, Viraj
    Yenamandra, Sriram
    Singh, Aaditya
    Hoffman, Judy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
    Tresson, Paul
    Dumont, Maxime
    Jaeger, Marc
    Borne, Frederic
    Boivin, Stephane
    Marie-Louise, Loic
    Francois, Jeremie
    Boukcim, Hassan
    Goeau, Herve
    GEODERMA, 2024, 450
  • [33] Guiding Attention for Self-Supervised Learning with Transformers
    Deshpande, Ameet
    Narasimhan, Karthik
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4676 - 4686
  • [34] MaskVO: Self-Supervised Visual Odometry with a Learnable Dynamic Mask
    Xuan, Weihao
    Ren, Ruijie
    Wu, Siyuan
    Chen, Changhao
    2022 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII 2022), 2022, : 225 - 231
  • [35] Self-Supervised Vision for Climate Downscaling
    Singh, Karandeep
    Jeong, Chaeyoon
    Shidqi, Naufal
    Park, Sungwon
    Nellikkatti, Arjun
    Zeller, Elke
    Cha, Meeyoung
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 7456 - 7464
  • [36] Understanding Self-Attention of Self-Supervised Audio Transformers
    Yang, Shu-wen
    Liu, Andy T.
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 3785 - 3789
  • [37] Masked Discrimination for Self-supervised Learning on Point Clouds
    Liu, Haotian
    Cai, Mu
    Lee, Yong Jae
    COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 657 - 675
  • [38] A Masked Self-Supervised Pretraining Method for Face Parsing
    Li, Zhuang
    Cao, Leilei
    Wang, Hongbin
    Xu, Lihong
    MATHEMATICS, 2022, 10 (12)
  • [39] Self-supervised pseudo-colorizing of masked cells
    Wagner, Royden
    Lopez, Carlos Fernandez
    Stiller, Christoph
    PLOS ONE, 2023, 18 (08):
  • [40] MST: Masked Self-Supervised Transformer for Visual Representation
    Li, Zhaowen
    Chen, Zhiyang
    Yang, Fan
    Li, Wei
    Zhu, Yousong
    Zhao, Chaoyang
    Deng, Rui
    Wu, Liwei
    Zhao, Rui
    Tang, Ming
    Wang, Jinqiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34