Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引：0

作者：

Hu, Hao ^{[1
]}

Baldassarre, Federico ^{[1
]}

Azizpour, Hossein ^{[1
]}

机构：

[1] KTH Royal Inst Technol, Stockholm, Sweden

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷

关键词：

Vision transformer; Transfer learning; Computer vision;

D O I：

10.1007/978-3-031-26409-2_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.

引用

页码：409 / 426

页数：18

共 47 条

[31] Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
Xing, Liping
Jin, Hongmei
Li, Hong-an
Li, Zhanli
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 103
[32] Few-shot segmentation for esophageal OCT images based on self-supervised vision transformer
Wang, Cong
Gan, Meng
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
[33] DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer
Kumar, Sonal
Sur, Arijit
Baruah, Rashmi Dutta
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1775 - 1788
[34] Self-supervised vision transformer-based few-shot learning for facial expression recognition
Chen, Xuanchi
Zheng, Xiangwei
Sun, Kai
Liu, Weilong
Zhang, Yuang
INFORMATION SCIENCES, 2023, 634 : 206 - 226
[35] Identification of Dental Lesions Using Self-Supervised Vision Transformer in Radiographic X-ray Images
Li Y.
Zhao H.
Yang D.
Du S.
Cui X.
Zhang J.
Computer-Aided Design and Applications, 2024, 21 (S23): : 332 - 342
[36] ST-VTON: Self-supervised vision transformer for image-based virtual try-on
Chong, Zheng
Mo, Lingfei
IMAGE AND VISION COMPUTING, 2022, 127
[37] EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision
Qu, Qiang
Chen, Xiaoming
Chung, Yuk Ying
Shen, Yiran
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6579 - 6591
[38] Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition
Stefanov, Kalin
Beskow, Jonas
Salvi, Giampiero
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2020, 12 (02) : 250 - 259
[39] Self-supervised non-rigid structure from motion with improved training of Wasserstein GANs
Wang, Yaming
Peng, Xiangyang
Huang, Wenqing
Ye, Xiaoping
Jiang, Mingfeng
IET COMPUTER VISION, 2023, 17 (04) : 404 - 414
[40] CoRev2: A Unified Contrastive and Restorative Self-Supervised Learning Method With Masked Prompting Tuning for Surface Defect Inspection
Wu, Huangyuan
Li, Bin
Tian, Lianfang
Dong, Chao
Liao, Wenzhi
Liu, Yu
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74

← 1 2 3 4 5 →