Trojaning Attack on Neural Networks

被引:510
作者
Liu, Yingqi [1 ]
Ma, Shiqing [1 ]
Aafer, Yousra [1 ]
Lee, Wen-Chuan [1 ]
Zhai, Juan [2 ]
Wang, Weihang [1 ]
Zhang, Xiangyu [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Nanjing Univ, Nanjing, Peoples R China
来源
25TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2018) | 2018年
基金
美国国家科学基金会;
关键词
D O I
10.14722/ndss.2018.23291
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the fast spread of machine learning techniques, sharing and adopting public machine learning models become very popular. This gives attackers many new opportunities. In this paper, we propose a trojaning attack on neural networks. As the models are not intuitive for human to understand, the attack features stealthiness. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). We first inverse the neural network to generate a general trojan trigger, and then retrain the model with reversed engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. In our attack, we do not need to tamper with the original training process, which usually takes weeks to months. Instead, it takes minutes to hours to apply our attack. Also, we do not require the datasets that are used to train the model. In practice, the datasets are usually not shared due to privacy or copyright concerns. We use five different applications to demonstrate the power of our attack, and perform a deep analysis on the possible factors that affect the attack. The results show that our attack is highly effective and efficient. The trojaned behaviors can be successfully triggered (with nearly 100% possibility) without affecting its test accuracy for normal input and even with better accuracy on public dataset. Also, it only takes a small amount of time to attack a complex neuron network model. In the end, we also discuss possible defense against such attacks.
引用
收藏
页数:15
相关论文
共 59 条
  • [1] [Anonymous], GRADIENTZOO PRETRAIN
  • [2] [Anonymous], Bigml machine learning repository
  • [3] [Anonymous], 2017, P 10 ACM WORKSH ART
  • [4] [Anonymous], Adience dataset
  • [5] [Anonymous], 2015, BMVC 2015
  • [6] [Anonymous], Moview review data
  • [7] [Anonymous], Vgg face dataset
  • [8] [Anonymous], How old do i look?
  • [9] [Anonymous], Speech recognition with the caffe deep learning framework
  • [10] [Anonymous], QUESTION CLASSIFICAT