Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引:0
|
作者
Hu, Hao [1 ]
Baldassarre, Federico [1 ]
Azizpour, Hossein [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
关键词
Vision transformer; Transfer learning; Computer vision;
D O I
10.1007/978-3-031-26409-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.
引用
收藏
页码:409 / 426
页数:18
相关论文
共 50 条
  • [41] A Survey on Masked Autoencoder for Visual Self-supervised Learning
    Zhang, Chaoning
    Zhang, Chenshuang
    Song, Junha
    Yi, John Seon Keun
    Kweon, In So
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6805 - 6813
  • [42] Contrastive Masked Autoencoders for Self-Supervised Video Hashing
    Wang, Yuting
    Wang, Jinpeng
    Chen, Bin
    Zeng, Ziyun
    Xia, Shu-Tao
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2733 - 2741
  • [43] Masked Autoencoders for Point Cloud Self-supervised Learning
    Pang, Yatian
    Wang, Wenxiao
    Tay, Francis E. H.
    Liu, Wei
    Tian, Yonghong
    Yuan, Li
    COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 604 - 621
  • [44] Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval
    Peer, Marco
    Kleber, Florian
    Sablatnig, Robert
    FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 122 - 136
  • [45] Self-supervised Medical Out-of-Distribution Using U-Net Vision Transformers
    Park, Seongjin
    Balint, Adam
    Hwang, Hyejin
    BIOMEDICAL IMAGE REGISTRATION, DOMAIN GENERALISATION AND OUT-OF-DISTRIBUTION ANALYSIS, 2022, 13166 : 104 - 110
  • [46] Self-supervised Vision Transformers for image-to-image labeling: a BiaPy solution to the LightMyCells Challenge
    Franco-Barranco, Daniel
    Gonzalez-Marfil, Aitor
    Arganda-Carreras, Ignacio
    IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,
  • [47] Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
    Laugier, Leo
    Pavlopoulos, John
    Sorensen, Jeffrey
    Dixon, Lucas
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1442 - 1461
  • [48] Self-supervised Video Hashing via Bidirectional Transformers
    Li, Shuyan
    Li, Xiu
    Lu, Jiwen
    Zhou, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13544 - 13553
  • [49] Self-supervised Visual Transformers for Breast Cancer Diagnosis
    Saidnassim, Nurbek
    Abdikenov, Beibit
    Kelesbekov, Rauan
    Akhtar, Muhammad Tahir
    Jamwal, Prashant
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 423 - 427
  • [50] FactoFormer: Factorized Hyperspectral Transformers With Self-Supervised Pretraining
    Mohamed, Shaheer
    Haghighat, Maryam
    Fernando, Tharindu
    Sridharan, Sridha
    Fookes, Clinton
    Moghadam, Peyman
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14