Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing

被引:12
作者
Xiang, Zhen [1 ,2 ]
Miller, J. David [1 ,2 ]
Kesidis, George [1 ,2 ]
机构
[1] Anomalee Inc, State Coll, PA 16803 USA
[2] Penn State Univ, Sch EECS, University Pk, PA 16802 USA
关键词
Backdoor; Trojan; Adversarial learning; Reverse engineering; Deep neural network; Image classification;
D O I
10.1016/j.cose.2021.102280
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Backdoor data poisoning (a.k.a. Trojan attack) is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the backdoor pattern is present; 2) maintain high classification accuracy for backdoor-free test images. In this paper, we make a breakthrough in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the classifier training phase . This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based reverse engineering defense that jointly: 1) detects whether the training set is poisoned; 2) if so, accurately identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reverse engineers an estimate of the backdoor pattern used by the attacker. In benchmark experiments on CIFAR-10 (as well as four other data sets), considering a variety of attacks, our defense achieves a new state-of-the-art by reducing the attack success rate to no more than 4.9% after removing detected suspicious training images. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:24
相关论文
共 43 条
[1]   Wild patterns: Ten years after the rise of adversarial machine learning [J].
Biggio, Battista ;
Roli, Fabio .
PATTERN RECOGNITION, 2018, 84 :317-331
[2]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[3]  
Chen B., 2018, Detecting backdoor attacks on deep neural networks by activation clustering
[4]  
Chen HL, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4658
[5]  
Chen Xinyun, 2017, Targeted backdoor attacks on deep learning systems using data poisoning
[6]  
Dai Jiazhu, 2019, A backdoor attack against lstm-based text classification systems
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Duda R.O., 2001, Pattern Classification, V2nd ed.
[9]   STRIP: A Defence Against Trojan Attacks on Deep Neural Networks [J].
Gao, Yansong ;
Xu, Change ;
Wang, Derui ;
Chen, Shiping ;
Ranasinghe, Damith C. ;
Nepal, Surya .
35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, :113-125
[10]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1