Detecting Adversarial Samples for Deep Learning Models: A Comparative Study

被引:18
作者
Zhang, Shigeng [1 ,2 ]
Chen, Shuxin [1 ]
Liu, Xuan [3 ,4 ]
Hua, Chengyao [1 ]
Wang, Weiping [1 ]
Chen, Kai [2 ,5 ,6 ]
Zhang, Jian [1 ]
Wang, Jianxin [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Hunan, Peoples R China
[2] Chinese Acad Sci, State Key Lab Informat Secur, Inst Informat Engn, Beijing 100093, Peoples R China
[3] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[4] Sci & Technol Parallel & Distributed Proc Laborat, Changsha 410073, Peoples R China
[5] Chinese Acad Sci, Inst Informat Engn, SKLOIS, Beijing 100093, Peoples R China
[6] Univ Chinese Acad Sci, Sch Cyber Secu Rity, Beijing, Peoples R China
来源
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING | 2022年 / 9卷 / 01期
基金
中国国家自然科学基金;
关键词
Training; Neural networks; Deep learning; Detectors; Robustness; Predictive models; Safety; Adversarial detection efficiency; adversarial samples detection; deep learning attacks; image classification;
D O I
10.1109/TNSE.2021.3057071
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Deep learning techniques such as convolutional neural networks (CNNs) have been used in a wide range of fields due to their superior performance, e.g., image classification, autonomous driving and natural language processing. However, recent progress shows that deep learning models are vulnerable to adversarial samples, which are crafted by adding small perturbations on normal samples that are imperceptible to human beings but can mislead the deep learning models to output incorrect results. Many adversarial attack models are proposed and many adversarial detection methods are developed to detect adversarial samples generated by these attack models. However, the evaluations of these detection methods are fragmented and scatter in separate literature, and the community still lacks a comprehensive understanding of the ability and performance of existing adversarial detection methods when facing different attack models on different datasets. In this paper, by using image classification as the example application scenario, we conduct a comprehensive study on the performance of five mainstream adversarial detection methods against five major attack models on four widely used benchmark datasets. We find that the detection accuracy of different methods interleaves for different attack models and dataset. Moreover, besides detection accuracy, we also evaluate the time efficiency of different detection methods. The findings reported in this paper can provide useful insights when designing systems to detect adversarial samples and act as a guideline to design new methods to detect adversarial samples.
引用
收藏
页码:231 / 244
页数:14
相关论文
共 40 条
[1]  
Athalye A, 2018, PR MACH LEARN RES, V80
[2]   MODELS OF NATURAL-LANGUAGE UNDERSTANDING [J].
BATES, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) :9977-9982
[3]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[4]  
Carlini Nicholas., 2017, P AISEC
[5]   Probabilistic Jacobian-Based Saliency Maps Attacks [J].
Combey, Theo ;
Loison, Antonio ;
Faucher, Maxime ;
Hajri, Hatem .
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2020, 2 (04) :558-578
[6]  
Das N., 2017, Keeping the bad guys out: Protecting and vaccinating deep learning with JPEG compression
[7]  
Ding G. W., 2019, arXiv preprint arXiv:1902.07623
[8]  
Dziugaite G. K., 2016, ARXIV160800853, P1
[9]   Robust Physical-World Attacks on Deep Learning Visual Classification [J].
Eykholt, Kevin ;
Evtimov, Ivan ;
Fernandes, Earlence ;
Li, Bo ;
Rahmati, Amir ;
Xiao, Chaowei ;
Prakash, Atul ;
Kohno, Tadayoshi ;
Song, Dawn .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1625-1634
[10]  
Feinman R., 2017, Detecting adversarial samples from artifacts