DI-AA: An interpretable white-box attack for fooling deep neural networks

被引：19

作者：

Wang, Yixiang ^{[1
]}

Liu, Jiqiang ^{[1
]}

Chang, Xiaolin ^{[1
]}

Rodriguez, Ricardo J. ^{[2
]}

Wang, Jianhua ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Trans, Beijing 100044, Peoples R China

[2] Univ Zaragoza, Dept Comp Sci & Syst Engn, Zaragoza, Spain

来源：

INFORMATION SCIENCES | 2022年 / 610卷

基金：

国家重点研发计划;

关键词：

Adversarial example; Deep learning; Interpretability; Robustness; White-box attack; ADVERSARIAL ATTACKS;

D O I：

10.1016/j.ins.2022.07.157

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

White-box adversarial example (AE) attacks on deep neural networks (DNNs) have a more powerful destructive capacity than black-box attacks using AE strategies. However, few studies have been conducted on the generation of low-perturbation adversarial examples from the interpretability perspective. Specifically, adversaries who conducted attacks lacked interpretation from the point of view of DNNs, and the perturbation was not further considered. To address these, we propose an interpretable white-box AE attack approach, DI-AA, which not only explores the application of the interpretable method of deep Taylor decomposition in selecting the most contributing features but also adopts the Lagrangian relaxation optimization of the logit output and Lp norm to make the perturbation more unnoticeable. We compare DI-AA with eight baseline attacks on four representative data -sets. Experimental results reveal that our approach can (1) attack nonrobust models with low perturbation, where the perturbation is closer to or lower than that of the state-of-the-art white-box AE attacks; (2) evade the detection of the adversarial-training robust models with the highest success rate; (3) be flexible in the degree of AE generation saturation. Additionally, the AE generated by DI-AA can reduce the accuracy of the robust black-box models by 16-31 % in the black-box manner.(c) 2022 Elsevier Inc. All rights reserved.

引用

页码：14 / 32

页数：19

共 50 条

[1] Adversarial example detection for DNN models: a review and experimental comparison [J].

Aldahdooh, Ahmed ;

Hamidouche, Wassim ;

Fezza, Sid Ahmed ;

Deforges, Olivier .

ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (06) :4403-4462

[2]

Andriushchenko M, 2020, ADV NEUR IN, V33

[3]

[Anonymous], 2013, P INT C LEARN REPR S

[4]

[Anonymous], IEEE COMPUTER VISION

[5]

[Anonymous], 2009, Rep. TR-2009

[6]

Athalye A, 2018, PR MACH LEARN RES, V80

[7] Adversarial Robustness on In- and Out-Distribution Improves Explainability [J].

Augustin, Maximilian ;

Meinke, Alexander ;

Hein, Matthias .

COMPUTER VISION - ECCV 2020, PT XXVI, 2020, 12371 :228-245

[8]

Boopathy Akhilan, 2020, P INT C MACH LEARN, P1014

[9] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[10]

Carmon Y, 2019, 33 C NEURAL INFORM P, V32

← 1 2 3 4 5 →