Improving vision transformer for medical image classification via token-wise perturbation

被引:2
|
作者
Li, Yuexiang [1 ]
Huang, Yawen [2 ]
He, Nanjun [3 ]
Ma, Kai [2 ]
Zheng, Yefeng [1 ,2 ]
机构
[1] Guangxi Med Univ, Med AI Res MARS Grp, Guangxi Collaborat Innovat Ctr Genom & Personalize, Ctr Genom & Personalized Med,Guangxi Key Lab Genom, Nanning 530021, Peoples R China
[2] Tencent Jarvis Res Ctr, YouTu Lab, Shenzhen 518000, Peoples R China
[3] OPPO, Shenzhen 518000, Peoples R China
基金
国家重点研发计划;
关键词
Self-supervised learning; Vision transformer; Image classification;
D O I
10.1016/j.jvcir.2023.104022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning (SSL) approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation -invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art SSL approaches.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Improving Chicken Disease Classification Based on Vision Transformer and Combine with Integrated Gradients Explanation
    Huong Hoang Luong
    Triet Minh Nguyen
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (04) : 1236 - 1249
  • [22] DUAL TRANSFORMER ENCODER MODEL FOR MEDICAL IMAGE CLASSIFICATION
    Yan, Fangyuan
    Yan, Bin
    Pei, Mingtao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 690 - 694
  • [23] Token-Selective Vision Transformer for fine-grained image recognition of marine organisms
    Si, Guangzhe
    Xiao, Ying
    Wei, Bin
    Bullock, Leon Bevan
    Wang, Yueyue
    Wang, Xiaodong
    FRONTIERS IN MARINE SCIENCE, 2023, 10
  • [24] Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images
    Hamano, Genki
    Imaizumi, Shoko
    Kiya, Hitoshi
    SENSORS, 2023, 23 (07)
  • [25] IEViT: An enhanced vision transformer architecture for chest X-ray image classification
    Okolo, Gabriel Iluebe
    Katsigiannis, Stamos
    Ramzan, Naeem
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
  • [26] RViT: Robust Fusion Vision Transformer with Variational Hierarchical Denoising Process for Image Classification
    Lin, Zhenghong
    Wu, Yuze
    Chen, Jiawei
    Wang, Shiping
    GUIDANCE NAVIGATION AND CONTROL, 2024, 04 (03)
  • [27] Image Classification of Tree Species in Relatives Based on Dual-Branch Vision Transformer
    Wang, Qi
    Dong, Yanqi
    Xu, Nuo
    Xu, Fu
    Mou, Chao
    Chen, Feixiang
    FORESTS, 2024, 15 (12):
  • [28] CvTMorph: Improving Local Feature Extraction in Medical Image Registration for Respiratory Motion Modeling with Convolutional Vision Transformer
    Chen, Peizhi
    Zou, Xupeng
    Gou, Yifan
    CURRENT MEDICAL IMAGING, 2024, 20 : e15734056302592
  • [29] Res-MGCA-SE: a lightweight convolutional neural network based on vision transformer for medical image classification
    Soleimani-Fard S.
    Ko S.-B.
    Neural Computing and Applications, 2024, 36 (28) : 17631 - 17644
  • [30] REVIEW OF VISION TRANSFORMER MODELS FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
    Lv, Pengyuan
    Wu, Wenjun
    Zhong, Yanfei
    Zhang, Liangpei
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2231 - 2234