Positional Label for Self-Supervised Vision Transformer

被引:0
|
作者
Zhang, Zhemin [1 ]
Gong, Xun [1 ,2 ,3 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu, Sichuan, Peoples R China
[2] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Beijing, Peoples R China
[3] Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Positional encoding is important for vision transformer (ViT) to capture the spatial structure of the input image. General effectiveness has been proven in ViT. In our work we propose to train ViT to recognize the positional label of patches of the input image, this apparently simple task actually yields a meaningful self-supervisory task. Based on previous work on ViT positional encoding, we propose two positional labels dedicated to 2D images including absolute position and relative position. Our positional labels can be easily plugged into various current ViT variants. It can work in two ways: (a) As an auxiliary training target for vanilla ViT for better performance. (b) Combine the self-supervised ViT to provide a more powerful self-supervised signal for semantic feature learning. Experiments demonstrate that with the proposed self-supervised methods, ViT-B and Swin-B gain improvements of 1.20% (top-1 Acc) and 0.74% (top-1 Acc) on ImageNet, respectively, and 6.15% and 1.14% improvement on Mini-ImageNet. The code is publicly available at: https://github.com/zhangzhemin/PositionalLabel.
引用
收藏
页码:3516 / 3524
页数:9
相关论文
共 50 条
  • [1] Pseudo-label enhancement for weakly supervised object detection using self-supervised vision transformer
    Yang, Kequan
    Wu, Yuanchen
    Li, Jide
    Yin, Chao
    Li, Xiaoqiang
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [2] MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
    Zhao, Chaoqiang
    Zhang, Youmin
    Poggi, Matteo
    Tosi, Fabio
    Guo, Xianda
    Zhu, Zheng
    Huang, Guan
    Tang, Yang
    Mattoccia, Stefano
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 668 - 678
  • [3] Self-supervised Video Transformer
    Ranasinghe, Kanchana
    Naseer, Muzammal
    Khan, Salman
    Khan, Fahad Shahbaz
    Ryoo, Michael S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
  • [4] Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization
    Doerrich, Sebastian
    Di Salvo, Francesco
    Ledig, Christian
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 644 - 654
  • [5] A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning
    Ma, Yaxin
    Li, Ming
    Chang, Jun
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Self-Supervised Pretraining Vision Transformer With Masked Autoencoders for Building Subsurface Model
    Li, Yuanyuan
    Alkhalifah, Tariq
    Huang, Jianping
    Li, Zhenchun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [7] Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
    Lu, Kaixuan
    Zhang, Ruiqian
    Huang, Xiao
    Xie, Yuxing
    Ning, Xiaogang
    Zhang, Hanchao
    Yuan, Mengke
    Zhang, Pan
    Wang, Tao
    Liao, Tongkui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [8] Self-Supervised Pretraining Vision Transformer With Masked Autoencoders for Building Subsurface Model
    Li, Yuanyuan
    Alkhalifah, Tariq
    Huang, Jianping
    Li, Zhenchun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [9] Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels
    Gul, Ahmet Gokberk
    Cetin, Oezdemir
    Reich, Christoph
    Flinner, Nadine
    Prangemeier, Tim
    Koeppl, Heinz
    MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY, 2022, 12039
  • [10] Self-supervised approach for diabetic retinopathy severity detection using vision transformer
    Ohri, Kriti
    Kumar, Mukesh
    Sukheja, Deepak
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, : 165 - 183