Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引:0
|
作者
Hu, Hao [1 ]
Baldassarre, Federico [1 ]
Azizpour, Hossein [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷
关键词
Vision transformer; Transfer learning; Computer vision;
D O I
10.1007/978-3-031-26409-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.
引用
收藏
页码:409 / 426
页数:18
相关论文
共 47 条
  • [21] Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
    Lu, Kaixuan
    Zhang, Ruiqian
    Huang, Xiao
    Xie, Yuxing
    Ning, Xiaogang
    Zhang, Hanchao
    Yuan, Mengke
    Zhang, Pan
    Wang, Tao
    Liao, Tongkui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [22] Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision
    Jaspers, Tim J. M.
    de Jonker, Ronald L. P. D.
    Al Khalil, Yasmina
    Zeelenberg, Tijn
    Kusters, Carolus H. J.
    Li, Yiping
    van Jaarsveld, Romy C.
    Bakker, Franciscus H. A.
    Ruurda, Jelle P.
    Brinkman, Willem M.
    De With, Peter H. N.
    van der Sommen, Fons
    DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2024, 2025, 15265 : 43 - 53
  • [23] Towards Improved and Interpretable Action Quality Assessment with Self-Supervised Alignment
    Roditakis, Konstantinos
    Makris, Alexandros
    Argyros, Antonis
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 507 - 513
  • [24] Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels
    Gul, Ahmet Gokberk
    Cetin, Oezdemir
    Reich, Christoph
    Flinner, Nadine
    Prangemeier, Tim
    Koeppl, Heinz
    MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY, 2022, 12039
  • [25] Self-supervised approach for diabetic retinopathy severity detection using vision transformer
    Ohri, Kriti
    Kumar, Mukesh
    Sukheja, Deepak
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, : 165 - 183
  • [26] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
    Zhan, Yucheng
    Zhao, Yucheng
    Luo, Chong
    Zhang, Yueyi
    Sun, Xiaoyan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
  • [27] Enhanced Self-Supervised Transmission Inspection with Improved Region Prior and Scale Variation
    Xie, Wei
    Wu, Fei
    Ouyang, Chao
    Yang, Yan
    Qian, Jian
    Lin, Shuang
    Zhou, Chenxi
    Zhang, Jun
    PROCESSES, 2024, 12 (12)
  • [28] A SELF-SUPERVISED PRE-TRAINING FRAMEWORK FOR VISION-BASED SEIZURE CLASSIFICATION
    Hou, Jen-Cheng
    McGonigal, Aileen
    Bartolomei, Fabrice
    Thonnat, Monique
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1151 - 1155
  • [29] Multi-label remote sensing classification with self-supervised gated multi-modal transformers
    Liu, Na
    Yuan, Ye
    Wu, Guodong
    Zhang, Sai
    Leng, Jie
    Wan, Lihong
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
  • [30] Clinical Outcome Prediction in COVID-19 using Self-supervised Vision Transformer Representations
    Konwer, Aishik
    Prasanna, Prateek
    MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033