A Feature Map Adversarial Attack Against Vision Transformers

被引:0
作者
Altoub, Majed [1 ]
Mehmood, Rashid [2 ]
AlQurashi, Fahad [1 ]
Alqahtany, Saad [2 ]
Alsulami, Bassma [1 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia
[2] Islamic Univ Madinah, Fac Comp & Informat Syst, Dept Comp Sci, Madinah 42351, Saudi Arabia
关键词
Vision transformers; adversarial attacks; DNNs; vulnerabilities; feature maps; perturbations; spatial domains; frequency domains;
D O I
10.14569/IJACSA.2024.0151097
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
IMAGE classification is a domain where Deep Neural Networks (DNNs) have demonstrated remarkable achievements. Recently, Vision Transformers (ViTs) have shown potential in handling large-scale image classification challenges by efficiently scaling to higher resolutions and accommodating larger input sizes compared to traditional Convolutional Neural Networks (CNNs). However, in the context of adversarial attacks, ViTs are still considered vulnerable. Feature maps serve as the foundation for representing and extracting meaningful information from images. While CNNs excel at capturing local features and spatial relationships, ViTs are better at understanding global context and long-range dependencies. This paper proposes a feature map ViT-specific adversarial example attack called Feature Map ViTspecific Attack (FMViTA). The objective of the investigation is to generate adversarial perturbations in the spatial and frequency domains of the image representation that allow deeper distance measurement between perturbed and targeted images. The experiments focus on a ViT pre-trained model that is fine-tuned on the ImageNet dataset. The proposed attack demonstrates the vulnerability of ViTs to adversarial examples by showing that even allowing only 0.02 maximum perturbation magnitude to be added to the input samples gives 100% attack success rate.
引用
收藏
页码:962 / 968
页数:7
相关论文
共 31 条
  • [11] STOCHASTIC ESTIMATION OF THE MAXIMUM OF A REGRESSION FUNCTION
    KIEFER, J
    WOLFOWITZ, J
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (03): : 462 - 466
  • [12] Kim G., 2024, 2024 IEEE CVF WINT C, P3964, DOI [10.1109/WACV57701.2024.00393, DOI 10.1109/WACV57701.2024.00393]
  • [13] Kingma D. P., 2014, arXiv, DOI DOI 10.48550/ARXIV.1412.6980
  • [14] Kurakin A, 2017, Arxiv, DOI [arXiv:1611.01236, DOI 10.48550/ARXIV.1611.01236]
  • [15] Loshchilov I, 2019, Arxiv, DOI arXiv:1711.05101
  • [16] Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review
    Mauricio, Jose
    Domingues, Ines
    Bernardino, Jorge
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [17] O'Shea K, 2015, Arxiv, DOI arXiv:1511.08458
  • [18] Park N., 2022, arXiv
  • [19] Paul S., 2021, arXiv
  • [20] Polyak B.T., 1964, Ussr computational mathematics and mathematical physics, V4, P1, DOI [10.1016/0041-5553(64)90137-5, DOI 10.1016/0041-5553(64)90137-5]