Improving vision transformer for medical image classification via token-wise perturbation

被引:2
|
作者
Li, Yuexiang [1 ]
Huang, Yawen [2 ]
He, Nanjun [3 ]
Ma, Kai [2 ]
Zheng, Yefeng [1 ,2 ]
机构
[1] Guangxi Med Univ, Med AI Res MARS Grp, Guangxi Collaborat Innovat Ctr Genom & Personalize, Ctr Genom & Personalized Med,Guangxi Key Lab Genom, Nanning 530021, Peoples R China
[2] Tencent Jarvis Res Ctr, YouTu Lab, Shenzhen 518000, Peoples R China
[3] OPPO, Shenzhen 518000, Peoples R China
基金
国家重点研发计划;
关键词
Self-supervised learning; Vision transformer; Image classification;
D O I
10.1016/j.jvcir.2023.104022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning (SSL) approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation -invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art SSL approaches.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification
    Liu, Jun
    Guo, Haoran
    He, Yile
    Li, Huali
    REMOTE SENSING, 2023, 15 (21)
  • [32] Quantitative regularization in robust vision transformer for remote sensing image classification
    Song, Huaxiang
    Yuan, Yuxuan
    Ouyang, Zhiwei
    Yang, Yu
    Xiang, Hui
    PHOTOGRAMMETRIC RECORD, 2024, 39 (186) : 340 - 372
  • [33] Token labeling-guided multi-scale medical image classification
    Yan, Fangyuan
    Yan, Bin
    Liang, Wei
    Pei, Mingtao
    PATTERN RECOGNITION LETTERS, 2024, 178 : 28 - 34
  • [34] Multi-Label Dental Image Classification via Vision Transformer for Orthopantomography X-ray Images
    Li Y.
    Zhang J.
    Computer-Aided Design and Applications, 2024, 21 (S21): : 198 - 207
  • [35] A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method
    Chen, Yihan
    Gu, Xingyu
    Liu, Zhen
    Liang, Jia
    REMOTE SENSING, 2022, 14 (08)
  • [36] Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study
    Asmi Sriwastawa
    J. Angel Arul Jothi
    Multimedia Tools and Applications, 2024, 83 : 39731 - 39753
  • [37] Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study
    Sriwastawa, Asmi
    Jothi, J. Angel Arul
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39731 - 39753
  • [38] An enhanced vision transformer with wavelet position embedding for histopathological image classification
    Ding, Meidan
    Qu, Aiping
    Zhong, Haiqin
    Lai, Zhihui
    Xiao, Shuomin
    He, Penghui
    PATTERN RECOGNITION, 2023, 140
  • [39] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
    Bi, Meiqiao
    Wang, Minghua
    Li, Zhi
    Hong, Danfeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
  • [40] SimPoolFormer: A two-stream vision transformer for hyperspectral image classification
    Roy, Swalpa Kumar
    Jamali, Ali
    Chanussot, Jocelyn
    Ghamisi, Pedram
    Ghaderpour, Ebrahim
    Shahabi, Himan
    REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT, 2025, 37