RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

被引:0
|
作者
Yang, Wenkai [1 ]
Lin, Yankai [2 ]
Li, Peng [2 ]
Zhou, Jie [2 ]
Sun, Xu [1 ,3 ]
机构
[1] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
[2] Tencent Inc, Pattern Recognit Ctr, WeChat AI, Shenzhen, Peoples R China
[3] Peking Univ, Sch EECS, MOE Key Lab Computat Linguist, Beijing, Peoples R China
来源
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods.
引用
收藏
页码:8365 / 8381
页数:17
相关论文
共 50 条
  • [1] ROBUSTNESS-AWARE FILTER PRUNING FOR ROBUST NEURAL NETWORKS AGAINST ADVERSARIAL ATTACKS
    Lim, Hyuntak
    Roh, Si-Dong
    Park, Sangki
    Chung, Ki-Seok
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [2] Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor Attacks
    Li, Xiangjun
    Lu, Xin
    Li, Peixuan
    IEEE TRANSACTIONS ON RELIABILITY, 2024, 73 (03) : 1559 - 1568
  • [3] Enhancing robustness of backdoor attacks against backdoor defenses
    Hu, Bin
    Guo, Kehua
    Ren, Sheng
    Fang, Hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [4] Defending Against Backdoor Attacks by Quarantine Training
    Yu, Chengxu
    Zhang, Yulai
    IEEE ACCESS, 2024, 12 : 10681 - 10689
  • [5] BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
    Chen, Xiaoyi
    Salem, Ahmed
    Chen, Dingfan
    Backes, Michael
    Ma, Shiqing
    Shen, Qingni
    Wu, Zhonghai
    Zhang, Yang
    37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 554 - 569
  • [6] NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
    Mei, Kai
    Li, Zheng
    Wang, Zhenting
    Zhang, Yang
    Ma, Shiqing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15551 - 15565
  • [7] Defending against Backdoor Attacks in Natural Language Generation
    Sun, Xiaofei
    Li, Xiaoya
    Meng, Yuxian
    Ao, Xiang
    Lyu, Lingjuan
    Li, Jiwei
    Zhang, Tianwei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5257 - 5265
  • [8] Invariant Aggregator for Defending against Federated Backdoor Attacks
    Wang, Xiaoyang
    Dimitriadis, Dimitrios
    Koyejo, Sanmi
    Tople, Shruti
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [9] RAB: Provable Robustness Against Backdoor Attacks
    Weber, Maurice
    Xu, Xiaojun
    Karlas, Bojan
    Zhang, Ce
    Li, Bo
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 1311 - 1328
  • [10] SPECTRE Defending Against Backdoor Attacks Using Robust Statistics
    Hayase, Jonathan
    Kong, Weihao
    Somani, Raghav
    Oh, Sewoong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139