Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization

被引:192
作者
Li, Shaofeng [1 ,2 ]
Xue, Minhui [2 ]
Zhao, Benjamin [3 ,4 ]
Zhu, Haojin [1 ]
Zhang, Xinpeng [5 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[2] Univ Adelaide, Adelaide, SA 5005, Australia
[3] Univ New South Wales, Sydney, NSW, Australia
[4] Data61 CSIRO, Canberra, ACT 2601, Australia
[5] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Data models; Machine learning; Perturbation methods; Neural networks; Inspection; Image color analysis; Backdoor attacks; steganography; deep neural networks;
D O I
10.1109/TDSC.2020.3021407
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained to a normal model, which is only activated by some specific input (called triggers), trick the model into producing unexpected behavior. In this article, we create covert and scattered triggers for backdoor attacks, invisible backdoors, where triggers can fool both DNN models and human inspection. We apply our invisible backdoors through two state-of-the-art methods of embedding triggers for backdoor attacks. The first approach on Badnets embeds the trigger into DNNs through steganography. The second approach of a trojan attack uses two types of additional regularization terms to generate the triggers with irregular shape and size. We use the Attack Success Rate and Functionality to measure the performance of our attacks. We introduce two novel definitions of invisibility for human perception; one is conceptualized by the Perceptual Adversarial Similarity Score (PASS) and the other is Learned Perceptual Image Patch Similarity (LPIPS). We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as four datasets MNIST, CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates for the adversary, functionality for the normal users, and invisibility scores for the administrators. We finally argue that the proposed invisible backdoor attacks can effectively thwart the state-of-the-art trojan backdoor detection approaches.
引用
收藏
页码:2088 / 2105
页数:18
相关论文
共 53 条
[41]   Membership Inference Attacks Against Machine Learning Models [J].
Shokri, Reza ;
Stronati, Marco ;
Song, Congzheng ;
Shmatikov, Vitaly .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :3-18
[42]   Mastering the game of Go with deep neural networks and tree search [J].
Silver, David ;
Huang, Aja ;
Maddison, Chris J. ;
Guez, Arthur ;
Sifre, Laurent ;
van den Driessche, George ;
Schrittwieser, Julian ;
Antonoglou, Ioannis ;
Panneershelvam, Veda ;
Lanctot, Marc ;
Dieleman, Sander ;
Grewe, Dominik ;
Nham, John ;
Kalchbrenner, Nal ;
Sutskever, Ilya ;
Lillicrap, Timothy ;
Leach, Madeleine ;
Kavukcuoglu, Koray ;
Graepel, Thore ;
Hassabis, Demis .
NATURE, 2016, 529 (7587) :484-+
[43]   Privacy Risks of Securing Machine Learning Models against Adversarial Examples [J].
Song, Liwei ;
Shokri, Reza ;
Mittal, Prateek .
PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'19), 2019, :241-257
[44]   Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition [J].
Stallkamp, J. ;
Schlipsing, M. ;
Salmen, J. ;
Igel, C. .
NEURAL NETWORKS, 2012, 32 :323-332
[45]  
Steinhardt J, 2017, ADV NEUR IN, V30
[46]  
Tang D., 2019, CORR, V1908
[47]   Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks [J].
Wang, Bolun ;
Yao, Yuanshun ;
Shan, Shawn ;
Li, Huiying ;
Viswanath, Bimal ;
Zheng, Haitao ;
Zhao, Ben Y. .
2019 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2019), 2019, :707-723
[48]  
Warde-Farley D, 2016, NEURAL INF PROCESS S, P311
[49]  
Wu ST, 2016, INT C PAR DISTRIB SY, P1233, DOI [10.1109/ICPADS.2016.0167, 10.1109/ICPADS.2016.165]
[50]  
Xiang Z, 2020, INT CONF ACOUST SPEE, P3827, DOI [10.1109/icassp40776.2020.9054581, 10.1109/ICASSP40776.2020.9054581]