RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

被引:0
|
作者
Yang, Wenkai [1 ]
Lin, Yankai [2 ]
Li, Peng [2 ]
Zhou, Jie [2 ]
Sun, Xu [1 ,3 ]
机构
[1] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
[2] Tencent Inc, Pattern Recognit Ctr, WeChat AI, Shenzhen, Peoples R China
[3] Peking Univ, Sch EECS, MOE Key Lab Computat Linguist, Beijing, Peoples R China
来源
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods.
引用
收藏
页码:8365 / 8381
页数:17
相关论文
共 50 条
  • [41] Palette: Physically-Realizable Backdoor Attacks Against Video Recognition Models
    Gong, Xueluan
    Fang, Zheng
    Li, Bowen
    Wang, Tao
    Chen, Yanjiao
    Wang, Qian
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 2672 - 2685
  • [42] Defending malware detection models against evasion based adversarial attacks
    Rathore, Hemant
    Sasan, Animesh
    Sahay, Sanjay K.
    Sewak, Mohit
    PATTERN RECOGNITION LETTERS, 2022, 164 : 119 - 125
  • [43] Defending against Attribute-Correlation Attacks in Privacy-Aware Information Brokering
    Li, Fengjun
    Luo, Bo
    Liu, Peng
    Squicciarini, Anna C.
    Lee, Dongwon
    Chu, Chao-Hsien
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, 2009, 10 : 100 - +
  • [44] Fortifying NLP models against poisoning attacks: The power of personalized prediction architectures
    Ferdinan, Teddy
    Kocon, Jan
    INFORMATION FUSION, 2025, 144
  • [45] Defending Deep Learning Based Anomaly Detection Systems Against White-Box Adversarial Examples and Backdoor Attacks
    Alrawashdeh, Khaled
    Goldsmith, Stephen
    PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGY AND SOCIETY (ISTAS), 2021, : 294 - 301
  • [46] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
    Wang, Shuo
    Nepal, Surya
    Rudolph, Carsten
    Grobler, Marthie
    Chen, Shangyu
    Chen, Tianle
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539
  • [47] Securing AI Models Against Backdoor Attacks: A Novel Approach Using Image Steganography
    Ahmadi, Candra
    Chen, Jiann-Liang
    Lin, Yu -Ting
    JOURNAL OF INTERNET TECHNOLOGY, 2024, 25 (03): : 465 - 475
  • [48] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
    Zhang, Zhexin
    Yang, Junxiao
    Ke, Pei
    Mi, Fei
    Wang, Hongning
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8865 - 8887
  • [49] Temporal shuffling for defending deep action recognition models against adversarial attacks
    Hwang, Jaehui
    Zhang, Huan
    Choi, Jun-Ho
    Hsieh, Cho-Jui
    Lee, Jong-Seok
    NEURAL NETWORKS, 2024, 169 : 388 - 397
  • [50] A New Context-Aware Framework for Defending Against Adversarial Attacks in Hyperspectral Image Classification
    Tu, Bing
    He, Wangquan
    Li, Qianming
    Peng, Yishu
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61