Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization

被引：194

作者：

Li, Shaofeng ^{[1
,2
]}

Xue, Minhui ^{[2
]}

Zhao, Benjamin ^{[3
,4
]}

Zhu, Haojin ^{[1
]}

Zhang, Xinpeng ^{[5
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China

[2] Univ Adelaide, Adelaide, SA 5005, Australia

[3] Univ New South Wales, Sydney, NSW, Australia

[4] Data61 CSIRO, Canberra, ACT 2601, Australia

[5] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China

来源：

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING | 2021年 / 18卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Training; Data models; Machine learning; Perturbation methods; Neural networks; Inspection; Image color analysis; Backdoor attacks; steganography; deep neural networks;

D O I：

10.1109/TDSC.2020.3021407

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained to a normal model, which is only activated by some specific input (called triggers), trick the model into producing unexpected behavior. In this article, we create covert and scattered triggers for backdoor attacks, invisible backdoors, where triggers can fool both DNN models and human inspection. We apply our invisible backdoors through two state-of-the-art methods of embedding triggers for backdoor attacks. The first approach on Badnets embeds the trigger into DNNs through steganography. The second approach of a trojan attack uses two types of additional regularization terms to generate the triggers with irregular shape and size. We use the Attack Success Rate and Functionality to measure the performance of our attacks. We introduce two novel definitions of invisibility for human perception; one is conceptualized by the Perceptual Adversarial Similarity Score (PASS) and the other is Learned Perceptual Image Patch Similarity (LPIPS). We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as four datasets MNIST, CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates for the adversary, functionality for the normal users, and invisibility scores for the administrators. We finally argue that the proposed invisible backdoor attacks can effectively thwart the state-of-the-art trojan backdoor detection approaches.

引用

页码：2088 / 2105

页数：18

共 53 条

[1] Deep Learning with Differential Privacy [J].

Abadi, Martin ;

Chu, Andy ;

Goodfellow, Ian ;

McMahan, H. Brendan ;

Mironov, Ilya ;

Talwar, Kunal ;

Zhang, Li .

CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318

[2]

Alfeld S, 2016, AAAI CONF ARTIF INTE, P1452

[3]

[Anonymous], 2011, Proceedings of 2011 16th International Conference on Intelligent System Applications to Power Systems

[4]

[Anonymous], 2007, Digital watermarking and steganography

[5] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[6] Wild patterns: Ten years after the rise of adversarial machine learning [J].

Biggio, Battista ;

Roli, Fabio .

PATTERN RECOGNITION, 2018, 84 :317-331

[7]

Brown Tom B, 2017, 31 C NEUR INF PROC S

[8] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[9]

Ciresan D, 2012, PROC CVPR IEEE, P3642, DOI 10.1109/CVPR.2012.6248110

[10]

Demontis A, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P321

← 1 2 3 4 5 6 →