Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers

被引：0

作者：

Hu, Hao ^{[1
]}

Baldassarre, Federico ^{[1
]}

Azizpour, Hossein ^{[1
]}

机构：

[1] KTH Royal Inst Technol, Stockholm, Sweden

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III | 2023年 / 13715卷

关键词：

Vision transformer; Transfer learning; Computer vision;

D O I：

10.1007/978-3-031-26409-2_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision transformers have recently shown remarkable performance in various visual recognition tasks specifically for self-supervised representation learning. The key advantage of transformers for self supervised learning, compared to their convolutional counterparts, is the reduced inductive biases that makes transformers amenable to learning rich representations from massive amounts of unlabelled data. On the other hand, this flexibility makes self-supervised vision transformers susceptible to overfitting when fine-tuning them on small labeled target datasets. Therefore, in this work, we make a simple yet effective architectural change by introducing new learnable masked tokens to vision transformers whereby we reduce the effect of overfitting in transfer learning while retaining the desirable flexibility of vision transformers. Through several experiments based on two seminal self-supervised vision transformers, SiT and DINO, and several small target visual recognition tasks, we show consistent and significant improvements in the accuracy of the fine-tuned models across all target tasks.

引用

页码：409 / 426

页数：18

共 47 条

[21] Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Lu, Kaixuan
Zhang, Ruiqian
Huang, Xiao
Xie, Yuxing
Ning, Xiaogang
Zhang, Hanchao
Yuan, Mengke
Zhang, Pan
Wang, Tao
Liao, Tongkui
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[22] Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision
Jaspers, Tim J. M.
de Jonker, Ronald L. P. D.
Al Khalil, Yasmina
Zeelenberg, Tijn
Kusters, Carolus H. J.
Li, Yiping
van Jaarsveld, Romy C.
Bakker, Franciscus H. A.
Ruurda, Jelle P.
Brinkman, Willem M.
De With, Peter H. N.
van der Sommen, Fons
DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2024, 2025, 15265 : 43 - 53
[23] Towards Improved and Interpretable Action Quality Assessment with Self-Supervised Alignment
Roditakis, Konstantinos
Makris, Alexandros
Argyros, Antonis
THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 507 - 513
[24] Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels
Gul, Ahmet Gokberk
Cetin, Oezdemir
Reich, Christoph
Flinner, Nadine
Prangemeier, Tim
Koeppl, Heinz
MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY, 2022, 12039
[25] Self-supervised approach for diabetic retinopathy severity detection using vision transformer
Ohri, Kriti
Kumar, Mukesh
Sukheja, Deepak
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, : 165 - 183
[26] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
Zhan, Yucheng
Zhao, Yucheng
Luo, Chong
Zhang, Yueyi
Sun, Xiaoyan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
[27] Enhanced Self-Supervised Transmission Inspection with Improved Region Prior and Scale Variation
Xie, Wei
Wu, Fei
Ouyang, Chao
Yang, Yan
Qian, Jian
Lin, Shuang
Zhou, Chenxi
Zhang, Jun
PROCESSES, 2024, 12 (12)
[28] A SELF-SUPERVISED PRE-TRAINING FRAMEWORK FOR VISION-BASED SEIZURE CLASSIFICATION
Hou, Jen-Cheng
McGonigal, Aileen
Bartolomei, Fabrice
Thonnat, Monique
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1151 - 1155
[29] Multi-label remote sensing classification with self-supervised gated multi-modal transformers
Liu, Na
Yuan, Ye
Wu, Guodong
Zhang, Sai
Leng, Jie
Wan, Lihong
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
[30] Clinical Outcome Prediction in COVID-19 using Self-supervised Vision Transformer Representations
Konwer, Aishik
Prasanna, Prateek
MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033

← 1 2 3 4 5 →