Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

被引:20
作者
Almalik, Faris [1 ]
Yaqub, Mohammad [1 ]
Nandakumar, Karthik [1 ]
机构
[1] Mohamed Bin Zayed Univ, Artificial Intelligence, Abu Dhabi, U Arab Emirates
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III | 2022年 / 13433卷
关键词
Adversarial attack; Vision transformer; Self-ensemble;
D O I
10.1007/978-3-031-16437-8_36
中图分类号
R445 [影像诊断学];
学科分类号
100207 ;
摘要
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with that of the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions can also help detect adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy) demonstrate the efficacy of SEViT architecture to defend against various adversarial attacks in the gray-box (attacker has full knowledge of the target model, but not the defense mechanism) setting. Code: https://github.com/faresmalik/SEViT
引用
收藏
页码:376 / 386
页数:11
相关论文
共 50 条
  • [41] Enhancing Skin Lesion Classification: A Self-Attention Fusion Approach with Vision Transformer
    Heroza, Rahmat Izwan
    Gan, John Q.
    Raza, Haider
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, PT II, MIUA 2024, 2024, 14860 : 309 - 322
  • [42] A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method
    Chen, Yihan
    Gu, Xingyu
    Liu, Zhen
    Liang, Jia
    REMOTE SENSING, 2022, 14 (08)
  • [43] Hybrid vision transformer framework for efficient and explainable SEM image-based nanomaterial classification
    Kaur, Manpreet
    Valderrama, Camilo E.
    Liu, Qian
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2025, 6 (01):
  • [44] Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study
    Sriwastawa, Asmi
    Jothi, J. Angel Arul
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39731 - 39753
  • [45] Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos
    AlDahoul N.
    Karim H.A.
    Momo M.A.
    Tan M.J.T.
    Fermin J.L.
    Multimedia Tools and Applications, 2025, 84 (10) : 7159 - 7181
  • [46] MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification
    Yu, Shuang
    Ma, Kai
    Bi, Qi
    Bian, Cheng
    Ning, Munan
    He, Nanjun
    Li, Yuexiang
    Liu, Hanruo
    Zheng, Yefeng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 45 - 54
  • [47] High accuracy food image classification via vision transformer with data augmentation and feature augmentation
    Gao, Xinle
    Xiao, Zhiyong
    Deng, Zhaohong
    JOURNAL OF FOOD ENGINEERING, 2024, 365
  • [48] Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study
    Asmi Sriwastawa
    J. Angel Arul Jothi
    Multimedia Tools and Applications, 2024, 83 : 39731 - 39753
  • [49] Fine-grained bird image classification based on counterfactual method of vision transformer model
    Tianhua Chen
    Yanyue Li
    Qinghua Qiao
    The Journal of Supercomputing, 2024, 80 : 6221 - 6239
  • [50] Fine-grained bird image classification based on counterfactual method of vision transformer model
    Chen, Tianhua
    Li, Yanyue
    Qiao, Qinghua
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (05) : 6221 - 6239