Towards Inspecting and Eliminating Trojan Backdoors in Deep Neural Networks

被引:34
作者
Guo, Wenbo [1 ]
Wang, Lun [2 ]
Xu, Yan [3 ]
Xing, Xinyu [1 ]
Du, Min [4 ]
Song, Dawn [2 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] Peking Univ, Beijing, Peoples R China
[4] Palo Alto Networks Inc, Santa Clara, CA USA
来源
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020) | 2020年
关键词
D O I
10.1109/ICDM50108.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A trojan backdoor is a hidden pattern typically implanted in a deep neural network (DNN). It could be activated and thus forces that infected model to behave abnormally when an input sample with a particular trigger is fed to that model. As such, given a DNN and clean input samples, it is challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this problem. They demonstrate that the proposed techniques have great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption of assuming the availability of the contaminated training database. On the other hand, these techniques can neither accurately detect the existence of trojan backdoors, nor restore high-fidelity triggers, especially when infected models are trained with high-dimensional data, and the triggers pertaining to the trojan vary in size, shape, and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes the detection of a trojan backdoor as solving an optimization objective function. Different from the existing technique which also models trojan detection as an optimization problem, TABOR first designs a new objective function that could guide optimization to identify a trojan backdoor more correctly and accurately. Second, TABOR borrows the idea of interpretable AI to further prune the restored triggers. Last, TABOR designs a new anomaly detection method, which could not only facilitate the identification of intentionally injected triggers but also filter out false alarms (i.e., triggers detected from an uninfected model). We train 112 DNNs on five datasets and infect these models with two existing trojan attacks. We evaluate TABOR by using these infected models, and demonstrate that TABOR has much better performance in trigger restoration, trojan detection, and elimination than Neural Cleanse, the state-of-the-art trojan detection technique.
引用
收藏
页码:162 / 171
页数:10
相关论文
共 32 条
[1]  
[Anonymous], P NEURIPS
[2]  
[Anonymous], 2018, P RAID
[3]  
[Anonymous], 2015, ACS SYM SER
[4]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[5]   Learning from Untrusted Data [J].
Charikar, Moses ;
Steinhardt, Jacob ;
Valiant, Gregory .
STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, :47-60
[6]  
Chen B., 2018, PROC SAFEAI
[7]  
Chen HL, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4658
[8]  
Chen X., 2017, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2017.691
[9]  
Chou E., 2018, ARXIV181200292
[10]  
Dabkowski Piotr, 2017, P NEURIPS