Robust Adversarial Example Detection Algorithm Based on High-Level Feature Differences

被引：0

作者：

Mu, Hua ^{[1
]}

Li, Chenggang ^{[2
,3
]}

Peng, Anjie ^{[4
]}

Wang, Yangyang ^{[1
]}

Liang, Zhenyu ^{[1
]}

机构：

[1] College of Electronic Engineering, National University of Defense Technology, Hefei

[2] The First People’s Hospital of Guangyuan, Guangyuan

[3] School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang

[4] Jianghuai Advanced Technology Center, Jianghuai

来源：

Sensors | 2025年 / 25卷 / 06期

关键词：

adversarial example detection; feature differences; feature encoder; robustness; similarity measurement model;

D O I：

10.3390/s25061770

中图分类号：

学科分类号：

摘要：

The threat posed by adversarial examples (AEs) to deep learning applications has garnered significant attention from the academic community. In response, various defense strategies have been proposed, including adversarial example detection. A range of detection algorithms has been developed to differentiate between benign samples and adversarial examples. However, the detection accuracy of these algorithms is significantly influenced by the characteristics of the adversarial attacks, such as attack type and intensity. Furthermore, the impact of image preprocessing on detection robustness—a common step before adversarial example generation—has been largely overlooked in prior research. To address these challenges, this paper introduces a novel adversarial example detection algorithm based on high-level feature differences (HFDs), which is specifically designed to improve robustness against both attacks and preprocessing operations. For each test image, a counterpart image with the same predicted label is randomly selected from the training dataset. The high-level features of both images are extracted using an encoder and compared through a similarity measurement model. If the feature similarity is low, the test image is classified as an adversarial example. The proposed method was evaluated for detection accuracy against four comparison methods, showing significant improvements over FS, DF, and MD, with a performance comparable to ESRM. Therefore, the subsequent robustness experiments focused exclusively on ESRM. Our results demonstrate that the proposed method exhibits superior robustness against preprocessing operations, such as downsampling and common corruptions, applied by attackers before generating adversarial examples. It is also applicable to various target models. By exploiting semantic conflicts in high-level features between clean and adversarial examples with the same predicted label, the method achieves high detection accuracy across diverse attack types while maintaining resilience to preprocessing, providing a valuable new perspective in the design of adversarial example detection algorithms. © 2025 by the authors.

引用

共 32 条

[1] Szegedy C., Zaremba W., Sutskever I., Bruna J., Erhan D., Goodfellow I., Fergus R., Intriguing Properties of Neural Networks, Proceedings of the International Conference on Learning Representations
[2] Goodfellow I., Shlens J., Szegedy C., Explaining and Harnessing Adversarial Examples, Proceedings of the International Conference on Learning Representations
[3] Xie C., Wang J., Zhang Z., Zhou Y., Xie L., Yuille A., Adversarial Examples for Semantic Segmentation and Object Detection, Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1378-1387, (2017)
[4] Wu H., Yunas S., Rowlands S., Ruan W., Wahlstrom J., Adversarial Driving: Attacking End-to-End Autonomous Driving, Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), pp. 1-7
[5] Bountakas P., Zarras A., Lekidis A., Xenakis C., Defense Strategies for Adversarial Machine Learning: A Survey, Comput. Sci. Rev, 49, (2023)
[6] Shafahi A., Najibi M., Ghiasi A., Xu Z., Dickerson J., Studer C., Davis L.S., Taylor G., Goldstein T., Adversarial Training for Free!, Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 3358-3369, (2019)
[7] Nie W., Guo B., Huang Y., Xiao C., Vahdat A., Anandkumar A., Diffusion Models for Adversarial Purification, Proceedings of the International Conference on Machine Learning
[8] Xu W., Evans D., Qi Y., Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks, Proceedings of the 2018 Network and Distributed System Security Symposium
[9] Aldahdooh A., Hamidouche W., Fezza S.A., Deforges O., Adversarial Example Detection for DNN Models: A Review and Experimental Comparison, Artif. Intell. Rev, 55, pp. 4403-4462, (2022)
[10] Liu J., Zhang W., Zhang Y., Hou D., Liu Y., Zha H., Yu N., Detection Based Defense Against Adversarial Examples from the Steganalysis Point of View, Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4820-4829, (2019)

← 1 2 3 4 →