Improving vision transformer for medical image classification via token-wise perturbation

被引:2
|
作者
Li, Yuexiang [1 ]
Huang, Yawen [2 ]
He, Nanjun [3 ]
Ma, Kai [2 ]
Zheng, Yefeng [1 ,2 ]
机构
[1] Guangxi Med Univ, Med AI Res MARS Grp, Guangxi Collaborat Innovat Ctr Genom & Personalize, Ctr Genom & Personalized Med,Guangxi Key Lab Genom, Nanning 530021, Peoples R China
[2] Tencent Jarvis Res Ctr, YouTu Lab, Shenzhen 518000, Peoples R China
[3] OPPO, Shenzhen 518000, Peoples R China
基金
国家重点研发计划;
关键词
Self-supervised learning; Vision transformer; Image classification;
D O I
10.1016/j.jvcir.2023.104022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning (SSL) approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation -invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art SSL approaches.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] The Application of Vision Transformer in Image Classification
    He, Zhixuan
    2022 THE 6TH INTERNATIONAL CONFERENCE ON VIRTUAL AND AUGMENTED REALITY SIMULATIONS, ICVARS 2022, 2022, : 56 - 63
  • [2] ATMformer: An Adaptive Token Merging Vision Transformer for Remote Sensing Image Scene Classification
    Niu, Yi
    Song, Zhuochen
    Luo, Qingyu
    Chen, Guochao
    Ma, Mingming
    Li, Fu
    REMOTE SENSING, 2025, 17 (04)
  • [3] MedViT: A robust vision transformer for generalized medical image classification
    Manzari, Omid Nejati
    Ahmadabadi, Hamid
    Kashiani, Hossein
    Shokouhi, Shahriar B.
    Ayatollahi, Ahmad
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [4] Vision Transformer (ViT)-based Applications in Image Classification
    Huo, Yingzi
    Jin, Kai
    Cai, Jiahong
    Xiong, Huixuan
    Pang, Jiacheng
    2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 135 - 140
  • [5] Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification
    Almalik, Faris
    Yaqub, Mohammad
    Nandakumar, Karthik
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III, 2022, 13433 : 376 - 386
  • [6] RanMerFormer: Randomized vision transformer with token merging for brain tumor classification
    Wang, Jian
    Lu, Si -Yuan
    Wang, Shui-Hua
    Zhang, Yu-Dong
    NEUROCOMPUTING, 2024, 573
  • [7] Vision Transformer with window sequence merging mechanism for image classification
    Jiao, Erjie
    Leng, Qiangkui
    Guo, Jiamei
    Meng, Xiangfu
    Wang, Changzhong
    APPLIED SOFT COMPUTING, 2025, 171
  • [8] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606
  • [9] Network Intrusion Detection via Flow-to-Image Conversion and Vision Transformer Classification
    Ho, Chi Mai Kim
    Yow, Kin-Choong
    Zhu, Zhongwen
    Aravamuthan, Sarang
    IEEE ACCESS, 2022, 10 : 97780 - 97793
  • [10] FishAI: Automated hierarchical marine fish image classification with vision transformer
    Yang, Chenghan
    Zhou, Peng
    Wang, Chun-Sheng
    Fu, Ge-Yi
    Xu, Xue-Wei
    Niu, Zhibin
    Zhu, Lin
    Yuan, Ye
    Shen, Hong-Bin
    Pan, Xiaoyong
    ENGINEERING REPORTS, 2024, 6 (12)