Understanding The Robustness in Vision Transformers

被引:0
|
作者
Zhou, Daquan [1 ,2 ]
Yu, Zhiding [2 ]
Xie, Enze [3 ]
Xiao, Chaowei [2 ,4 ]
Anandkumar, Anima [2 ,5 ]
Feng, Jiashi [1 ,6 ]
Alvarez, Jose M. [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] NVIDIA, Santa Clara, CA 95050 USA
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] ASU, Tempe, AZ USA
[5] CALTECH, Pasadena, CA 91125 USA
[6] ByteDance, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations. We further propose a family of fully attentional networks (FANs) that strengthen this capability by incorporating an attentional channel processing design. We validate the design comprehensively on various hierarchical backbones. Our model achieves a state-of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters. We also demonstrate state-of-the-art accuracy and robustness in two downstream tasks: semantic segmentation and object detection. Code will be available at https://github.com/NVlabs/FAN.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Attacking Compressed Vision Transformers
    Parekh, Swapnil
    Shukla, Pratyush
    Shah, Devansh
    ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2, 2023, 652 : 743 - 758
  • [42] Vision Transformers for Computer Go
    Sagri, Amani
    Cazenave, Tristan
    Arjonilla, Jerome
    Saffidine, Abdallah
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2024, PT II, 2024, 14635 : 376 - 388
  • [43] Provable Repair of Vision Transformers
    Nawas, Stephanie
    Tao, Zhe
    Thakur, Aditya, V
    AI VERIFICATION, SAIV 2024, 2024, 14846 : 156 - 178
  • [44] Towards Understanding Cat Vocalizations: A Novel Cat Sound Classification Model Based on Vision Transformers
    Kucukkulahli, Enver
    Kabakus, Abdullah Talha
    APPLIED ACOUSTICS, 2024, 226
  • [45] Efficient data-driven behavior identification based on vision transformers for human activity understanding
    Yang, Jiachen
    Zhang, Zhuo
    Xiao, Shuai
    Ma, Shukun
    Li, Yang
    Lu, Wen
    Gao, Xinbo
    NEUROCOMPUTING, 2023, 530 : 104 - 115
  • [46] Enhancing the adversarial robustness in medical image classification: exploring adversarial machine learning with vision transformers-based models
    Elif Kanca Gulsoy
    Selen Ayas
    Elif Baykal Kablan
    Murat Ekinci
    Neural Computing and Applications, 2025, 37 (12) : 7971 - 7989
  • [47] Understanding ecosystem robustness
    Wilmers, Christopher C.
    TRENDS IN ECOLOGY & EVOLUTION, 2007, 22 (10) : 504 - 506
  • [48] Vision Transformers for Single Image Dehazing
    Song, Yuda
    He, Zhuqing
    Qian, Hui
    Du, Xin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1927 - 1941
  • [49] DynaSlim: Dynamic Slimming for Vision Transformers
    Shi, Da
    Gao, Jingsheng
    Liu, Ting
    Fu, Yuzhuo
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1451 - 1456
  • [50] Width & Depth Pruning for Vision Transformers
    Yu, Fang
    Huang, Kun
    Wang, Meng
    Cheng, Yuan
    Chu, Wei
    Cui, Li
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3143 - 3151