Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引：0

作者：

Hu, Hao ^{[1
]}

Baldassarre, Federico ^{[1
]}

Azizpour, Hossein ^{[1
]}

机构：

[1] KTH Royal Inst Technol, Stockholm, Sweden

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷

关键词：

Vision transformer; Transfer learning; Computer vision;

D O I：

10.1007/978-3-031-26409-2_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.

引用

页码：409 / 426

页数：18

共 47 条

[1] Self-supervised vision transformers for semantic segmentation
Gu, Xianfan
Hu, Yingdong
Wen, Chuan
Gao, Yang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[2] Self-supervised Vision Transformers for Writer Retrieval
Raven, Tim
Matei, Arthur
Fink, Gernot A.
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
[3] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
Cosma, Adrian
Catruna, Andy
Radoi, Emilian
SENSORS, 2023, 23 (05)
[4] Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
Saavedra-Ruiz, Miguel
Morin, Sacha
Paull, Liam
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 197 - 204
[5] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
Pang, Liyu
Li, Xiaoou
Wang, Zhen
Lei, Xueyi
Pei, Yulong
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
[6] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
Karpov, Aleksei
Makarov, Ilya
2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
[7] Improved transferability of self-supervised learning models through batch normalization finetuning
Sirotkin, Kirill
Escudero-Vinolo, Marcos
Carballeira, Pablo
Garcia-Martin, Alvaro
APPLIED INTELLIGENCE, 2024, 54 (22) : 11281 - 11294
[8] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
Wang, Yi
Albrecht, Conrad M.
Zhu, Xiao Xiang
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
[9] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
Park, Sangjoon
Lee, Ik Jae
Kim, Jun Won
Ye, Jong Chul
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
[10] Self-supervised Vision Transformers for 3D pose estimation of novel objects
Thalhammer, Stefan
Weibel, Jean-Baptiste
Vincze, Markus
Garcia-Rodriguez, Jose
IMAGE AND VISION COMPUTING, 2023, 139

← 1 2 3 4 5 →