共 72 条
[1]
Aldahdooh Ahmed, 2021, ARXIV210603734
[2]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[3]
Bai Yutong, 2021, ARXIV211105464
[4]
Understanding Robustness of Transformers for Image Classification
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:10211-10221
[5]
Brown T.B., 2017, Adversarial patch
[6]
Transformer Interpretability Beyond Attention Visualization
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:782-791
[7]
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:347-356
[8]
Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches
[J].
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA,
2020,
:3413-3421
[9]
Chen Z, 2020, IEEE WINT CONF APPL, P863, DOI [10.1109/WACV45572.2020.9093610, 10.1109/wacv45572.2020.9093610]
[10]
Chen Zhi, 2021, P 28 ACM INT C MULT