BadNets: Evaluating Backdooring Attacks on Deep Neural Networks

被引:658
作者
Gu, Tianyu [1 ]
Liu, Kang [1 ]
Dolan-Gavitt, Brendan [2 ]
Garg, Siddharth [1 ]
机构
[1] NYU, Dept Elect & Comp Engn, New York, NY 11002 USA
[2] NYU, Dept Comp Sci & Engn, New York, NY 11002 USA
基金
美国国家科学基金会;
关键词
Computer security; machine learning; neural networks;
D O I
10.1109/ACCESS.2019.2909068
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper, we show that the outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has the state-of-the-art performance on the user's training and validation samples but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our U.S. street sign detector can persist even if the network is later retrained for another task and cause a drop in an accuracy of 25% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and-because the behavior of neural networks is difficult to explicate-stealthy. This paper provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
引用
收藏
页码:47230 / 47244
页数:15
相关论文
共 54 条
[1]  
[Anonymous], 2012, P 29 INT COFERENCE I
[2]  
[Anonymous], P 3 INT C LEARNING R
[3]  
[Anonymous], NAT ENVIRON POLLUT T
[4]  
[Anonymous], P IEEE C COMP VIS PA
[5]  
[Anonymous], CS321N LECT NOTES
[6]  
[Anonymous], 2017, COMMUN ACM, DOI DOI 10.1145/3065386
[7]  
[Anonymous], 2012, ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
[8]  
[Anonymous], ROBUST PHYS WORLD AT
[9]  
[Anonymous], TARGETED BACKDOOR AT
[10]  
[Anonymous], P ACM AS C COMP COMM