Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing

被引：12

作者：

Xiang, Zhen ^{[1
,2
]}

Miller, J. David ^{[1
,2
]}

Kesidis, George ^{[1
,2
]}

机构：

[1] Anomalee Inc, State Coll, PA 16803 USA

[2] Penn State Univ, Sch EECS, University Pk, PA 16802 USA

来源：

COMPUTERS & SECURITY | 2021年 / 106卷

关键词：

Backdoor; Trojan; Adversarial learning; Reverse engineering; Deep neural network; Image classification;

D O I：

10.1016/j.cose.2021.102280

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Backdoor data poisoning (a.k.a. Trojan attack) is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the backdoor pattern is present; 2) maintain high classification accuracy for backdoor-free test images. In this paper, we make a breakthrough in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the classifier training phase . This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based reverse engineering defense that jointly: 1) detects whether the training set is poisoned; 2) if so, accurately identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reverse engineers an estimate of the backdoor pattern used by the attacker. In benchmark experiments on CIFAR-10 (as well as four other data sets), considering a variety of attacks, our defense achieves a new state-of-the-art by reducing the attack success rate to no more than 4.9% after removing detected suspicious training images. (c) 2021 Elsevier Ltd. All rights reserved.

引用

页数：24

共 43 条

[1] Wild patterns: Ten years after the rise of adversarial machine learning [J].

Biggio, Battista ;

Roli, Fabio .

PATTERN RECOGNITION, 2018, 84 :317-331

[2] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[3]

Chen B., 2018, Detecting backdoor attacks on deep neural networks by activation clustering

[4]

Chen HL, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4658

[5]

Chen Xinyun, 2017, Targeted backdoor attacks on deep learning systems using data poisoning

[6]

Dai Jiazhu, 2019, A backdoor attack against lstm-based text classification systems

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8]

Duda R.O., 2001, Pattern Classification, V2nd ed.

[9] STRIP: A Defence Against Trojan Attacks on Deep Neural Networks [J].

Gao, Yansong ;

Xu, Change ;

Wang, Derui ;

Chen, Shiping ;

Ranasinghe, Damith C. ;

Nepal, Surya .

35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, :113-125

[10]

Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1

← 1 2 3 4 5 →