Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引:0
|
作者
Hu, Hao [1 ]
Baldassarre, Federico [1 ]
Azizpour, Hossein [1 ]
机构
[1] KTH Royal Inst Technol, Stockholm, Sweden
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷
关键词
Vision transformer; Transfer learning; Computer vision;
D O I
10.1007/978-3-031-26409-2_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.
引用
收藏
页码:409 / 426
页数:18
相关论文
共 47 条
  • [31] Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
    Xing, Liping
    Jin, Hongmei
    Li, Hong-an
    Li, Zhanli
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 103
  • [32] Few-shot segmentation for esophageal OCT images based on self-supervised vision transformer
    Wang, Cong
    Gan, Meng
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
  • [33] DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer
    Kumar, Sonal
    Sur, Arijit
    Baruah, Rashmi Dutta
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1775 - 1788
  • [34] Self-supervised vision transformer-based few-shot learning for facial expression recognition
    Chen, Xuanchi
    Zheng, Xiangwei
    Sun, Kai
    Liu, Weilong
    Zhang, Yuang
    INFORMATION SCIENCES, 2023, 634 : 206 - 226
  • [35] Identification of Dental Lesions Using Self-Supervised Vision Transformer in Radiographic X-ray Images
    Li Y.
    Zhao H.
    Yang D.
    Du S.
    Cui X.
    Zhang J.
    Computer-Aided Design and Applications, 2024, 21 (S23): : 332 - 342
  • [36] ST-VTON: Self-supervised vision transformer for image-based virtual try-on
    Chong, Zheng
    Mo, Lingfei
    IMAGE AND VISION COMPUTING, 2022, 127
  • [37] EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision
    Qu, Qiang
    Chen, Xiaoming
    Chung, Yuk Ying
    Shen, Yiran
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6579 - 6591
  • [38] Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition
    Stefanov, Kalin
    Beskow, Jonas
    Salvi, Giampiero
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2020, 12 (02) : 250 - 259
  • [39] Self-supervised non-rigid structure from motion with improved training of Wasserstein GANs
    Wang, Yaming
    Peng, Xiangyang
    Huang, Wenqing
    Ye, Xiaoping
    Jiang, Mingfeng
    IET COMPUTER VISION, 2023, 17 (04) : 404 - 414
  • [40] CoRev2: A Unified Contrastive and Restorative Self-Supervised Learning Method With Masked Prompting Tuning for Surface Defect Inspection
    Wu, Huangyuan
    Li, Bin
    Tian, Lianfang
    Dong, Chao
    Liao, Wenzhi
    Liu, Yu
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74