Understanding The Robustness in Vision Transformers

被引:0
|
作者
Zhou, Daquan [1 ,2 ]
Yu, Zhiding [2 ]
Xie, Enze [3 ]
Xiao, Chaowei [2 ,4 ]
Anandkumar, Anima [2 ,5 ]
Feng, Jiashi [1 ,6 ]
Alvarez, Jose M. [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] NVIDIA, Santa Clara, CA 95050 USA
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] ASU, Tempe, AZ USA
[5] CALTECH, Pasadena, CA 91125 USA
[6] ByteDance, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations. We further propose a family of fully attentional networks (FANs) that strengthen this capability by incorporating an attentional channel processing design. We validate the design comprehensively on various hierarchical backbones. Our model achieves a state-of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters. We also demonstrate state-of-the-art accuracy and robustness in two downstream tasks: semantic segmentation and object detection. Code will be available at https://github.com/NVlabs/FAN.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Understanding and improving adversarial transferability of vision transformers and convolutional neural networks
    Chen, Zhiyu
    Xu, Chi
    Lv, Huanhuan
    Liu, Shangdong
    Ji, Yimu
    INFORMATION SCIENCES, 2023, 648
  • [22] Understanding transformers
    Hickman, I
    ELECTRONICS WORLD, 2001, 107 (1788): : 934 - 936
  • [23] Understanding transformers
    Hickman, I
    ELECTRONICS WORLD, 2001, 107 (1782): : 458 - 461
  • [24] Quantum Vision Transformers
    Cherrat, El Amine
    Kerenidis, Iordanis
    Mathur, Natansh
    Landman, Jonas
    Strahm, Martin
    Li, Yun Yvonna
    QUANTUM, 2024, 8 : 1 - 20
  • [25] Scaling Vision Transformers
    Zhai, Xiaohua
    Kolesnikov, Alexander
    Houlsby, Neil
    Beyer, Lucas
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12094 - 12103
  • [26] Transformers in Vision: A Survey
    Khan, Salman
    Naseer, Muzammal
    Hayat, Munawar
    Zamir, Syed Waqas
    Khan, Fahad Shahbaz
    Shah, Mubarak
    ACM COMPUTING SURVEYS, 2022, 54 (10S)
  • [27] Reversible Vision Transformers
    Mangalam, Karttikeya
    Fan, Haoqi
    Li, Yanghao
    Wu, Chao-Yuan
    Xiong, Bo
    Feichtenhofer, Christoph
    Malik, Jitendra
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10820 - 10830
  • [28] Denoising Vision Transformers
    Yang, Jiawei
    Luo, Katie Z.
    Li, Jiefeng
    Deng, Congyue
    Guibas, Leonidas
    Krishnan, Dilip
    Weinberger, Kilian Q.
    Tian, Yonglong
    Wang, Yue
    COMPUTER VISION - ECCV 2024, PT LXXXV, 2025, 15143 : 453 - 469
  • [29] Multiscale Vision Transformers
    Fan, Haoqi
    Xiong, Bo
    Mangalam, Karttikeya
    Li, Yanghao
    Yan, Zhicheng
    Malik, Jitendra
    Feichtenhofer, Christoph
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6804 - 6815
  • [30] Are Transformers More Robust? Towards Exact Robustness Verification for Transformers
    Liao, Brian Hsuan-Cheng
    Cheng, Chih-Hong
    Esen, Hasan
    Knoll, Alois
    COMPUTER SAFETY, RELIABILITY, AND SECURITY, SAFECOMP 2023, 2023, 14181 : 89 - 103