Improving adversarial robustness using knowledge distillation guided by attention information bottleneck

被引:4
作者
Gong, Yuxin [1 ,2 ]
Wang, Shen [1 ]
Yu, Tingyue [1 ]
Jiang, Xunzhi [1 ]
Sun, Fanghui [1 ,2 ]
机构
[1] Harbin Inst Technol, Sch Cyberspace Sci, Harbin 150000, Heilongjiang, Peoples R China
[2] Harbin Inst Technol, Songjiang Lab, Harbin 150001, Heilongjiang, Peoples R China
关键词
Adversarial example; Adversarial training; Information bottleneck; Knowledge distillation; Generalization; Interpretability; Adversarial robustness;
D O I
10.1016/j.ins.2024.120401
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks (DNNs) have recently been found to be vulnerable to adversarial examples, which raises concerns about their reliability and poses potential security threats. Adversarial training has been extensively studied to counter adversarial attacks. However, the limited attack types incorporated during the training phase will restrict the defense performance of models against unknown attacks and impact their standard accuracies. Furthermore, we discover that adversarial training models tend to overfit redundant noisy features, which hinders their generalization. To alleviate these issues, this paper proposes the attention information bottleneck -guided knowledge distillation (AIB-KD) method to enhance models' adversarial robustness. We integrate adversarial training with attention information bottleneck as the defense framework to achieve an optimal trade-off between information compression and classification performance. Simultaneously, we specifically employ knowledge distillation to guide the adversarial training models in learning both the standard attention information and valuable deep feature distributions to enhance their defense generalization capability. Experimental results demonstrate that AIB-KD can effectively classify adversarial examples in multiple attack settings. The average white -box and black -box classification accuracies for the WideResNet-28-10 model on the CIFAR-10 dataset are 56.59% and 85.49%, respectively, and the average accuracies on the SVHN dataset are 61.71% and 88.96%. When applied to unknown attack scenarios, AIB-KD is more effective and interpretable than state-of-the-art methods.
引用
收藏
页数:18
相关论文
共 50 条
[1]  
Alemi AA, 2019, Arxiv, DOI [arXiv:1612.00410, 10.48550/arXiv.1612.00410]
[2]   Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search [J].
Andriushchenko, Maksym ;
Croce, Francesco ;
Flammarion, Nicolas ;
Hein, Matthias .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :484-501
[3]  
Athalye A, 2018, PR MACH LEARN RES, V80
[4]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[5]  
Charikar M.S., 2002, PROC ACM S THEORY CO, P380
[6]   Salient feature extractor for adversarial defense on deep neural networks [J].
Chen, Ruoxi ;
Chen, Jinyin ;
Zheng, Haibin ;
Xuan, Qi ;
Ming, Zhaoyan ;
Jiang, Wenrong ;
Cui, Chen .
INFORMATION SCIENCES, 2022, 600 :118-143
[7]  
Chen T., 2020, INT C LEARN REPR
[8]  
Croce F, 2019, 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019)
[9]  
Croce F, 2020, PR MACH LEARN RES, V119
[10]   Deep image prior based defense against adversarial examples [J].
Dai, Tao ;
Feng, Yan ;
Chen, Bin ;
Lu, Jian ;
Xia, Shu-Tao .
PATTERN RECOGNITION, 2022, 122