Jacobian norm with Selective Input Gradient Regularization for interpretable adversarial defense

被引：12

作者：

Liu, Deyin ^{[1
]}

Wu, Lin Yuanbo ^{[2
]}

Li, Bo ^{[3
]}

Boussaid, Farid ^{[4
]}

Bennamoun, Mohammed ^{[4
]}

Xie, Xianghua ^{[2
]}

Liang, Chengwu ^{[5
]}

机构：

[1] Anhui Univ, Sch Artificial Intelligence, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Anhui, Peoples R China

[2] Swansea Univ, Dept Comp Sci, Swansea SA1 8EN, Wales

[3] Northwestern Polytech Univ, Xian 710072, Shaanxi, Peoples R China

[4] Univ Western Australia, Perth, WA 6009, Australia

[5] Henan Univ Urban Construct, Pingdingshan 467036, Henan, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 145卷

关键词：

Selective input gradient regularization; Jacobian normalization; Adversarial robustness;

D O I：

10.1016/j.patcog.2023.109902

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNNs) can be easily deceived by imperceptible alterations known as adversarial examples. These examples can lead to misclassification, posing a significant threat to the reliability of deep learning systems in real-world applications. Adversarial training (AT) is a popular technique used to enhance robustness by training models on a combination of corrupted and clean data. However, existing AT-based methods often struggle to handle transferred adversarial examples that can fool multiple defense models, thereby falling short of meeting the generalization requirements for real-world scenarios. Furthermore, AT typically fails to provide interpretable predictions, which are crucial for domain experts seeking to understand the behavior of DNNs. To overcome these challenges, we present a novel approach called Jacobian norm and Selective Input Gradient Regularization (J-SIGR). Our method leverages Jacobian normalization to improve robustness and introduces regularization of perturbation-based saliency maps, enabling interpretable predictions. By adopting J-SIGR, we achieve enhanced defense capabilities and promote high interpretability of DNNs. We evaluate the effectiveness of J-SIGR across various architectures by subjecting it to powerful adversarial attacks. Our experimental evaluations provide compelling evidence of the efficacy of J-SIGR against transferred adversarial attacks, while preserving interpretability. The project code can be found at https://github.com/Lywu-github/jJ-SIGR.git.

引用

页数：11

共 50 条

[1] Scaleable input gradient regularization for adversarial robustness
Finlay, Chris
Oberman, Adam M.
MACHINE LEARNING WITH APPLICATIONS, 2021, 3
[2] Bridging Optimal Transport and Jacobian Regularization by Optimal Trajectory for Enhanced Adversarial Defense
Le, Binh M.
Tariq, Shahroz
Woo, Simon S.
arXiv, 2023,
[3] Jacobian Regularization for Mitigating Universal Adversarial Perturbations
Co, Kenneth T.
Rego, David Martinez
Lupu, Emil C.
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 202 - 213
[4] Improving DNN Robustness to Adversarial Attacks Using Jacobian Regularization
Jakubovitz, Daniel
Girye, Raja
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 525 - 541
[5] Revisiting Gradient Regularization: Inject Robust Saliency-Aware Weight Bias for Adversarial Defense
Li, Qian
Hu, Qingyuan
Lin, Chenhao
Wu, Di
Shen, Chao
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 5936 - 5949
[6] ADVERSARIAL DEFENSE VIA LOCAL FLATNESS REGULARIZATION
Xu, Jia
Li, Yiming
Jiang, Yong
Xia, Shu-Tao
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2196 - 2200
[7] Interpretable Adversarial Perturbation in Input Embedding Space for Text
Sato, Motoki
Suzuki, Jun
Shindo, Hiroyuki
Matsumoto, Yuji
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4323 - 4330
[8] A Unified Gradient Regularization Family for Adversarial Examples
Lyu, Chunchuan
Huang, Kaizhu
Liang, Hai-Ning
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 301 - 309
[9] Adversarial Defense using Memristors and Input Preprocessing
Paudel, Bijay Raj
Tragoudas, Spyros
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
[10] Unifying Adversarial Training Algorithms with Data Gradient Regularization
Ororbia, Alexander G., II
Kifer, Daniel
Giles, C. Lee
NEURAL COMPUTATION, 2017, 29 (04) : 867 - 887

← 1 2 3 4 5 →