Positional Label for Self-Supervised Vision Transformer

被引：0

作者：

Zhang, Zhemin ^{[1
]}

Gong, Xun ^{[1
,2
,3
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu, Sichuan, Peoples R China

[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Beijing, Peoples R China

[3] Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu, Sichuan, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Positional encoding is important for vision transformer (ViT) to capture the spatial structure of the input image. General effectiveness has been proven in ViT. In our work we propose to train ViT to recognize the positional label of patches of the input image, this apparently simple task actually yields a meaningful self-supervisory task. Based on previous work on ViT positional encoding, we propose two positional labels dedicated to 2D images including absolute position and relative position. Our positional labels can be easily plugged into various current ViT variants. It can work in two ways: (a) As an auxiliary training target for vanilla ViT for better performance. (b) Combine the self-supervised ViT to provide a more powerful self-supervised signal for semantic feature learning. Experiments demonstrate that with the proposed self-supervised methods, ViT-B and Swin-B gain improvements of 1.20% (top-1 Acc) and 0.74% (top-1 Acc) on ImageNet, respectively, and 6.15% and 1.14% improvement on Mini-ImageNet. The code is publicly available at: https://github.com/zhangzhemin/PositionalLabel.

引用

页码：3516 / 3524

页数：9

共 50 条

[1] Pseudo-label enhancement for weakly supervised object detection using self-supervised vision transformer
Yang, Kequan
Wu, Yuanchen
Li, Jide
Yin, Chao
Li, Xiaoqiang
KNOWLEDGE-BASED SYSTEMS, 2025, 311
[2] MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
Zhao, Chaoqiang
Zhang, Youmin
Poggi, Matteo
Tosi, Fabio
Guo, Xianda
Zhu, Zheng
Huang, Guan
Tang, Yang
Mattoccia, Stefano
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 668 - 678
[3] Self-supervised Video Transformer
Ranasinghe, Kanchana
Naseer, Muzammal
Khan, Salman
Khan, Fahad Shahbaz
Ryoo, Michael S.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
[4] Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization
Doerrich, Sebastian
Di Salvo, Francesco
Ledig, Christian
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 644 - 654
[5] A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning
Ma, Yaxin
Li, Ming
Chang, Jun
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[6] Self-Supervised Pretraining Vision Transformer With Masked Autoencoders for Building Subsurface Model
Li, Yuanyuan
Alkhalifah, Tariq
Huang, Jianping
Li, Zhenchun
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[7] Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Lu, Kaixuan
Zhang, Ruiqian
Huang, Xiao
Xie, Yuxing
Ning, Xiaogang
Zhang, Hanchao
Yuan, Mengke
Zhang, Pan
Wang, Tao
Liao, Tongkui
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[8] Self-Supervised Pretraining Vision Transformer With Masked Autoencoders for Building Subsurface Model
Li, Yuanyuan
Alkhalifah, Tariq
Huang, Jianping
Li, Zhenchun
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[9] Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels
Gul, Ahmet Gokberk
Cetin, Oezdemir
Reich, Christoph
Flinner, Nadine
Prangemeier, Tim
Koeppl, Heinz
MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY, 2022, 12039
[10] Self-supervised approach for diabetic retinopathy severity detection using vision transformer
Ohri, Kriti
Kumar, Mukesh
Sukheja, Deepak
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, : 165 - 183

← 1 2 3 4 5 →