Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

被引：20

作者：

Almalik, Faris ^{[1
]}

Yaqub, Mohammad ^{[1
]}

Nandakumar, Karthik ^{[1
]}

机构：

[1] Mohamed Bin Zayed Univ, Artificial Intelligence, Abu Dhabi, U Arab Emirates

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT III | 2022年 / 13433卷

关键词：

Adversarial attack; Vision transformer; Self-ensemble;

D O I：

10.1007/978-3-031-16437-8_36

中图分类号：

R445 [影像诊断学];

学科分类号：

100207 ;

摘要：

Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with that of the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions can also help detect adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy) demonstrate the efficacy of SEViT architecture to defend against various adversarial attacks in the gray-box (attacker has full knowledge of the target model, but not the defense mechanism) setting. Code: https://github.com/faresmalik/SEViT

引用

页码：376 / 386

页数：11

共 50 条

[41] Enhancing Skin Lesion Classification: A Self-Attention Fusion Approach with Vision Transformer
Heroza, Rahmat Izwan
Gan, John Q.
Raza, Haider
MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, PT II, MIUA 2024, 2024, 14860 : 309 - 322
[42] A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method
Chen, Yihan
Gu, Xingyu
Liu, Zhen
Liang, Jia
REMOTE SENSING, 2022, 14 (08)
[43] Hybrid vision transformer framework for efficient and explainable SEM image-based nanomaterial classification
Kaur, Manpreet
Valderrama, Camilo E.
Liu, Qian
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2025, 6 (01):
[44] Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study
Sriwastawa, Asmi
Jothi, J. Angel Arul
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39731 - 39753
[45] Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos
AlDahoul N.
Karim H.A.
Momo M.A.
Tan M.J.T.
Fermin J.L.
Multimedia Tools and Applications, 2025, 84 (10) : 7159 - 7181
[46] MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification
Yu, Shuang
Ma, Kai
Bi, Qi
Bian, Cheng
Ning, Munan
He, Nanjun
Li, Yuexiang
Liu, Hanruo
Zheng, Yefeng
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 45 - 54
[47] High accuracy food image classification via vision transformer with data augmentation and feature augmentation
Gao, Xinle
Xiao, Zhiyong
Deng, Zhaohong
JOURNAL OF FOOD ENGINEERING, 2024, 365
[48] Vision transformer and its variants for image classification in digital breast cancer histopathology: a comparative study
Asmi Sriwastawa
J. Angel Arul Jothi
Multimedia Tools and Applications, 2024, 83 : 39731 - 39753
[49] Fine-grained bird image classification based on counterfactual method of vision transformer model
Tianhua Chen
Yanyue Li
Qinghua Qiao
The Journal of Supercomputing, 2024, 80 : 6221 - 6239
[50] Fine-grained bird image classification based on counterfactual method of vision transformer model
Chen, Tianhua
Li, Yanyue
Qiao, Qinghua
JOURNAL OF SUPERCOMPUTING, 2024, 80 (05) : 6221 - 6239

← 1 2 3 4 5 →