Understanding The Robustness in Vision Transformers

被引:0
|
作者
Zhou, Daquan [1 ,2 ]
Yu, Zhiding [2 ]
Xie, Enze [3 ]
Xiao, Chaowei [2 ,4 ]
Anandkumar, Anima [2 ,5 ]
Feng, Jiashi [1 ,6 ]
Alvarez, Jose M. [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] NVIDIA, Santa Clara, CA 95050 USA
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] ASU, Tempe, AZ USA
[5] CALTECH, Pasadena, CA 91125 USA
[6] ByteDance, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations. We further propose a family of fully attentional networks (FANs) that strengthen this capability by incorporating an attentional channel processing design. We validate the design comprehensively on various hierarchical backbones. Our model achieves a state-of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters. We also demonstrate state-of-the-art accuracy and robustness in two downstream tasks: semantic segmentation and object detection. Code will be available at https://github.com/NVlabs/FAN.
引用
收藏
页数:17
相关论文
共 50 条
  • [11] Vision transformers in domain adaptation and domain generalization: a study of robustness
    Alijani, Shadi
    Fayyad, Jamil
    Najjaran, Homayoun
    Neural Computing and Applications, 2024, 36 (29) : 17979 - 18007
  • [12] Scale-space Tokenization for Improving the Robustness of Vision Transformers
    Xu, Lei
    Kawakami, Rei
    Inoue, Nakamasa
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2684 - 2693
  • [13] On the robustness of vision transformers for in-flight monocular depth estimation
    Simone Ercolino
    Alessio Devoto
    Luca Monorchio
    Matteo Santini
    Silvio Mazzaro
    Simone Scardapane
    Industrial Artificial Intelligence, 1 (1):
  • [14] Improving Robustness of Vision Transformers by Reducing Sensitivity to Patch Corruptions
    Guo, Yong
    Stutz, David
    Schiele, Bernt
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4108 - 4118
  • [15] An Empirical Analysis of Vision Transformers Robustness to Spurious Correlations in Health Data
    Lacerda, Anisio
    Ayala, Daniel
    Malaguth, Francisco
    Kanadani, Fabio
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [16] Improved robustness of vision transformers via prelayernorm in patch emb e dding
    Kim, Bum Jun
    Choi, Hyeyeon
    Jang, Hyeonah
    Lee, Dong Gu
    Jeong, Wonseok
    Kim, Sang Woo
    PATTERN RECOGNITION, 2023, 141
  • [17] Vision Transformers Show Improved Robustness in High-Content Image Analysis
    Wieser, Mario
    Siegismund, Daniel
    Heyse, Stephan
    Steigele, Stephan
    2022 9TH SWISS CONFERENCE ON DATA SCIENCE (SDS), 2022, : 71 - 72
  • [18] Evaluating and enhancing the robustness of vision transformers against adversarial attacks in medical imaging
    Kanca, Elif
    Ayas, Selen
    Kablan, Elif Baykal
    Ekinci, Murat
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, : 673 - 690
  • [19] SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
    Xiao, Han
    Zheng, Wenzhao
    Zu, Sicheng
    Gao, Peng
    Zhou, Jie
    Lu, Jiwen
    COMPUTER VISION - ECCV 2024, PT XIII, 2025, 15071 : 37 - 54
  • [20] Robustness Tokens: Towards Adversarial Robustness of Transformers
    Pulfer, Brian
    Belousov, Yury
    Voloshynovskiy, Slava
    COMPUTER VISION - ECCV 2024, PT LIX, 2025, 15117 : 110 - 127