Improving transferable adversarial attack for vision transformers via global attention and local drop

被引：3

作者：

Li, Tuo ^{[1
]}

Han, Yahong ^{[1
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2023年 / 29卷 / 06期

关键词：

Adversarial examples; Vision transformer; Transferability; Self-attention;

D O I：

10.1007/s00530-023-01157-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Vision Transformers (ViTs) have been a new paradigm in several computer vision tasks, yet they are susceptible to adversarial examples. Recent studies show it is difficult to transfer adversarial examples generated by ViTs to other models. Existing methods have poor transferability because they do not target the specific structural characteristics (e.g., self-attention and patch-embedding) of ViTs. To address this problem and further boost transferability, we propose a method, namely Global Attention and Local Drop (GALD), to boost the transferability of adversarial examples from ViTs to other models, including ViTs and convolutional neural networks (CNNs). Specifically, our method contains two parts: Global Attention Guidance (GAG) and Drop Patch (DP). The GAG improves the attention representation in shallow layers by adding global guidance attention to every layer except the final layer of ViTs. Therefore, the perturbations could focus on the object regions. DP randomly drops some patches in every iteration to diversify the input patterns and mitigate overfitting of adversarial examples to the surrogate model. Experiments show that adversarial examples generated by our method own the best transferability to black-box models with unknown structures. Code is available at Link.

引用

页码：3467 / 3480

页数：14