Mitigating backdoor attacks in LSTM-based text classification systems by Backdoor Keyword Identification

被引:62
作者
Chen, Chuanshuai [1 ]
Dai, Jiazhu [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
关键词
Deep learning; Backdoor attack; LSTM; Text classification; Poisoning data;
D O I
10.1016/j.neucom.2021.04.105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on four different text classification datset: IMDB, DBpedia ontology, 20 newsgroups and Reuters-21578 dataset. It all achieves good performance regardless of the trigger sentences. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:253 / 262
页数:10
相关论文
共 22 条
[1]  
[Anonymous], TARGETED BACKDOOR AT
[2]  
Bagdasaryan E, 2020, PR MACH LEARN RES, V108, P2938
[3]  
Bojarski Mariusz, 2016, arXiv
[4]  
Chan Alvin, 2019, POISON CURE DETECTIN
[5]  
Chen B., 2019, CEUR WORKSHOP P
[6]   A Backdoor Attack Against LSTM-Based Text Classification Systems [J].
Dai, Jiazhu ;
Chen, Chuanshuai ;
Li, Yufeng .
IEEE ACCESS, 2019, 7 :138872-138878
[7]   Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers [J].
Gao, Ji ;
Lanchantin, Jack ;
Soffa, Mary Lou ;
Qi, Yanjun .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :50-56
[8]   STRIP: A Defence Against Trojan Attacks on Deep Neural Networks [J].
Gao, Yansong ;
Xu, Change ;
Wang, Derui ;
Chen, Shiping ;
Ranasinghe, Damith C. ;
Nepal, Surya .
35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, :113-125
[9]   BadNets: Evaluating Backdooring Attacks on Deep Neural Networks [J].
Gu, Tianyu ;
Liu, Kang ;
Dolan-Gavitt, Brendan ;
Garg, Siddharth .
IEEE ACCESS, 2019, 7 :47230-47244
[10]   DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia [J].
Lehmann, Jens ;
Isele, Robert ;
Jakob, Max ;
Jentzsch, Anja ;
Kontokostas, Dimitris ;
Mendes, Pablo N. ;
Hellmann, Sebastian ;
Morsey, Mohamed ;
van Kleef, Patrick ;
Auer, Soeren ;
Bizer, Christian .
SEMANTIC WEB, 2015, 6 (02) :167-195