RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

被引:0
|
作者
Yang, Wenkai [1 ]
Lin, Yankai [2 ]
Li, Peng [2 ]
Zhou, Jie [2 ]
Sun, Xu [1 ,3 ]
机构
[1] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
[2] Tencent Inc, Pattern Recognit Ctr, WeChat AI, Shenzhen, Peoples R China
[3] Peking Univ, Sch EECS, MOE Key Lab Computat Linguist, Beijing, Peoples R China
来源
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods.
引用
收藏
页码:8365 / 8381
页数:17
相关论文
共 50 条
  • [21] Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks
    Qin, Zeyu
    Yao, Liuyi
    Chen, Daoyuan
    Li, Yaliang
    Ding, Bolin
    Cheng, Minhao
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4743 - 4755
  • [22] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Towards Unified Robustness Against Both Backdoor and Adversarial Attacks
    Niu, Zhenxing
    Sun, Yuyao
    Miao, Qiguang
    Jin, Rong
    Hua, Gang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7589 - 7605
  • [24] Defending against Insertion-based Textual Backdoor Attacks via Attribution
    Li, Jiazhao
    Wu, Zhuofeng
    Ping, Wei
    Xiao, Chaowei
    Vydiswaran, V. G. Vinod
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8818 - 8833
  • [25] Using Random Perturbations to Mitigate Adversarial Attacks on NLP Models
    Swenor, Abigail
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13142 - 13143
  • [26] Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks
    Jia, Jinyuan
    Liu, Yupei
    Cao, Xiaoyu
    Gong, Neil Zhenqiang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9575 - 9583
  • [27] Dynamic Backdoor Attacks Against Machine Learning Models
    Salem, Ahmed
    Wen, Rui
    Backes, Michael
    Ma, Shiqing
    Zhang, Yang
    2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 703 - 718
  • [28] Defending Against Backdoor Attacks by Layer-wise Feature Analysis (Extended Abstract)
    Jebreel, Najeeb Moharram
    Domingo-Ferrer, Josep
    Li, Yiming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8416 - 8420
  • [29] Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
    Tejankar, Ajinkya
    Sanjabi, Maziar
    Wang, Qifan
    Wang, Sinong
    Firooz, Hamed
    Pirsiavash, Hamed
    Tan, Liang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12239 - 12249
  • [30] CRAB: CERTIFIED PATCH ROBUSTNESS AGAINST POISONING-BASED BACKDOOR ATTACKS
    Ji, Huxiao
    Li, Jie
    Wu, Chentao
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2486 - 2490