Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引:0
|
作者
Hu, Hao [1 ]
Baldassarre, Federico [1 ]
Azizpour, Hossein [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
关键词
Vision transformer; Transfer learning; Computer vision;
D O I
10.1007/978-3-031-26409-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.
引用
收藏
页码:409 / 426
页数:18
相关论文
共 50 条
  • [21] On Separate Normalization in Self-supervised Transformers
    Chen, Xiaohui
    Wang, Yinkai
    Du, Yuanqi
    Hassoun, Soha
    Liu, Li-Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
    Pincic, Domagoj
    Susanj, Diego
    Lenac, Kristijan
    SENSORS, 2022, 22 (19)
  • [23] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
    Chen, Richard J.
    Chen, Chengkuan
    Li, Yicong
    Chen, Tiffany Y.
    Trister, Andrew D.
    Krishnan, Rahul G.
    Mahmood, Faisal
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
  • [24] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
    Wang, Yi
    Albrecht, Conrad M.
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
  • [25] An Improved Masking Strategy for Self-Supervised Masked Reconstruction in Human Activity Recognition
    Wang, Jinqiang
    Cui, Wenxuan
    Zhu, Tao
    Ning, Huansheng
    Liu, Zhenyu
    IEEE SENSORS JOURNAL, 2024, 24 (11) : 18699 - 18709
  • [26] Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
    Kang, Dahyun
    Koniusz, Piotr
    Cho, Minsu
    Murray, Naila
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19627 - 19638
  • [27] GraphMAE: Self-Supervised Masked Graph Autoencoders
    Hou, Zhenyu
    Liu, Xiao
    Cen, Yukuo
    Dong, Yuxiao
    Yang, Hongxia
    Wang, Chunjie
    Tang, Jie
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 594 - 604
  • [28] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
    Park, Sangjoon
    Lee, Ik Jae
    Kim, Jun Won
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
  • [29] Self-supervised Vision Transformers for 3D pose estimation of novel objects
    Thalhammer, Stefan
    Weibel, Jean-Baptiste
    Vincze, Markus
    Garcia-Rodriguez, Jose
    IMAGE AND VISION COMPUTING, 2023, 139
  • [30] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
    Lin, Tzu-Quan
    Lin, Guan-Ting
    Lee, Hung-Yi
    Tang, Hao
    arXiv,