Defending against Insertion-based Textual Backdoor Attacks via Attribution

被引:0
作者
Li, Jiazhao [1 ]
Wu, Zhuofeng [1 ]
Ping, Wei [5 ]
Xiao, Chaowei [3 ,4 ]
Vydiswaran, V. G. Vinod [1 ,2 ]
机构
[1] Univ Michigan, Sch Informat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Learning Hlth Sci, Ann Arbor, MI 48109 USA
[3] Univ Wisconsin Madison, Madison, WI USA
[4] Arizona State Univ, Tempe, AZ USA
[5] NVIDIA, Santa Clara, CA USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks, BadNL and InSent Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results and therefore are more likely to be poison triggers. Additionally, we further utilize an external pre-trained language model to distinguish whether input is poisoned or not. We show that our proposed method can generalize sufficiently well in two common attack scenarios (poisoning training data and testing data), which consistently improves previous methods. For instance, AttDef can successfully mitigate both attacks with an average accuracy of 79.97% (56.59%up arrow) and 48.34% (3.99%up arrow) under pre-training and post-training attack defense respectively, achieving the new state-of-the-art performance on prediction recovery over four benchmark datasets.(1)
引用
收藏
页码:8818 / 8833
页数:16
相关论文
共 32 条
[1]  
Alzantot M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2890
[2]  
Chen Sishuo, 2022, ARXIV
[3]  
Chen Xiaoyi, 2021, ICML 2021 WORKSH ADV
[4]  
CHEN XL, 2017, ARXIV, P4106, DOI DOI 10.1109/ICCV.2017.440
[5]  
Clark K, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P285
[6]   Aggregation Frequency Response Modeling for Wind Power Plants With Primary Frequency Regulation Service [J].
Dai, Jianfeng ;
Tang, Yi ;
Wang, Qi ;
Jiang, Ping .
IEEE ACCESS, 2019, 7 :108561-108570
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]   Visualizing and Understanding Neural Machine Translation [J].
Ding, Yanzhuo ;
Liu, Yang ;
Luan, Huanbo ;
Sun, Maosong .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1150-1159
[9]   NATURAL LANGUAGE PROCESSING IN ACCOUNTING, AUDITING AND FINANCE: A SYNTHESIS OF THE LITERATURE WITH A ROADMAP FOR FUTURE RESEARCH [J].
Fisher, Ingrid E. ;
Garnsey, Margaret R. ;
Hughes, Mark E. .
INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2016, 23 (03) :157-214
[10]  
Gu Tianyu, 2017, ARXIV