Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引:0
|
作者
Hu, Hao [1 ]
Baldassarre, Federico [1 ]
Azizpour, Hossein [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷
关键词
Vision transformer; Transfer learning; Computer vision;
D O I
10.1007/978-3-031-26409-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.
引用
收藏
页码:409 / 426
页数:18
相关论文
共 47 条
  • [1] Self-supervised vision transformers for semantic segmentation
    Gu, Xianfan
    Hu, Yingdong
    Wen, Chuan
    Gao, Yang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [2] Self-supervised Vision Transformers for Writer Retrieval
    Raven, Tim
    Matei, Arthur
    Fink, Gernot A.
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
  • [3] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
    Cosma, Adrian
    Catruna, Andy
    Radoi, Emilian
    SENSORS, 2023, 23 (05)
  • [4] Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
    Saavedra-Ruiz, Miguel
    Morin, Sacha
    Paull, Liam
    2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 197 - 204
  • [5] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
    Pang, Liyu
    Li, Xiaoou
    Wang, Zhen
    Lei, Xueyi
    Pei, Yulong
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
  • [6] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
    Karpov, Aleksei
    Makarov, Ilya
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
  • [7] Improved transferability of self-supervised learning models through batch normalization finetuning
    Sirotkin, Kirill
    Escudero-Vinolo, Marcos
    Carballeira, Pablo
    Garcia-Martin, Alvaro
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11281 - 11294
  • [8] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
    Wang, Yi
    Albrecht, Conrad M.
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
  • [9] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
    Park, Sangjoon
    Lee, Ik Jae
    Kim, Jun Won
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
  • [10] Self-supervised Vision Transformers for 3D pose estimation of novel objects
    Thalhammer, Stefan
    Weibel, Jean-Baptiste
    Vincze, Markus
    Garcia-Rodriguez, Jose
    IMAGE AND VISION COMPUTING, 2023, 139