Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

被引:20
作者
Almalik, Faris [1 ]
Yaqub, Mohammad [1 ]
Nandakumar, Karthik [1 ]
机构
[1] Mohamed Bin Zayed Univ, Artificial Intelligence, Abu Dhabi, U Arab Emirates
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III | 2022年 / 13433卷
关键词
Adversarial attack; Vision transformer; Self-ensemble;
D O I
10.1007/978-3-031-16437-8_36
中图分类号
R445 [影像诊断学];
学科分类号
100207 ;
摘要
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with that of the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions can also help detect adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy) demonstrate the efficacy of SEViT architecture to defend against various adversarial attacks in the gray-box (attacker has full knowledge of the target model, but not the defense mechanism) setting. Code: https://github.com/faresmalik/SEViT
引用
收藏
页码:376 / 386
页数:11
相关论文
共 50 条
  • [31] TransMCGC: a recast vision transformer for small-scale image classification tasks
    Xiang, Jian-Wen
    Chen, Min-Rong
    Li, Pei-Shan
    Zou, Hao-Li
    Li, Shi-Da
    Huang, Jun-Jie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (10) : 7697 - 7718
  • [32] Refined Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo D.
    Yoo J.
    Transactions of the Korean Institute of Electrical Engineers, 2024, 73 (06) : 1004 - 1011
  • [33] Stroke Disease Classification Using CT Scan Image with Vision Transformer Method
    Yopiangga, Alfian Prisma
    Badriyah, Tessy
    Syarif, Iwan
    Sakinah, Nur
    2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024, 2024, : 436 - 441
  • [34] Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images
    Hamano, Genki
    Imaizumi, Shoko
    Kiya, Hitoshi
    SENSORS, 2023, 23 (07)
  • [35] IEViT: An enhanced vision transformer architecture for chest X-ray image classification
    Okolo, Gabriel Iluebe
    Katsigiannis, Stamos
    Ramzan, Naeem
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
  • [36] ATMformer: An Adaptive Token Merging Vision Transformer for Remote Sensing Image Scene Classification
    Niu, Yi
    Song, Zhuochen
    Luo, Qingyu
    Chen, Guochao
    Ma, Mingming
    Li, Fu
    REMOTE SENSING, 2025, 17 (04)
  • [37] Hyperspectral Image Classification Based on Multi-stage Vision Transformer with Stacked Samples
    Chen, Xiaoyue
    Kamata, Sei-Ichiro
    Zhou, Weilian
    2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 441 - 446
  • [38] Image Classification of Tree Species in Relatives Based on Dual-Branch Vision Transformer
    Wang, Qi
    Dong, Yanqi
    Xu, Nuo
    Xu, Fu
    Mou, Chao
    Chen, Feixiang
    FORESTS, 2024, 15 (12):
  • [39] MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets
    Du, Siyi
    Bayasi, Nourhan
    Hamarneh, Ghassan
    Garbi, Rafeef
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 448 - 458
  • [40] WFSS: weighted fusion of spectral transformer and spatial self-attention for robust hyperspectral image classification against adversarial attacks
    Lichun Tang
    Zhaoxia Yin
    Hang Su
    Wanli Lyu
    Bin Luo
    Visual Intelligence, 2 (1):