Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor Attacks

被引:1
|
作者
Li, Xiangjun [1 ]
Lu, Xin [1 ]
Li, Peixuan [1 ]
机构
[1] Nanchang Univ, Sch Software, Nanchang 330000, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Syntactics; Task analysis; Feature extraction; Semantics; Data models; Trojan horses; Deep neural networks (DNNs); natural language processing (NLP); backdoor attacks; style transfer; paraphrase;
D O I
10.1109/TR.2024.3375526
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
At present, deep neural networks are at risk from backdoor attacks, but natural language processing (NLP) lacks sufficient research on backdoor attacks. To improve the invisibility of backdoor attacks, some innovative textual backdoor attack methods utilize modern language models to generate poisoned text with backdoor triggers, which are called feature space backdoor attacks. However, this article find that texts generated by the same language model without backdoor triggers also have a high probability of activating the backdoors they injected. Therefore, this article proposes a multistyle transfer-based backdoor attack that uses multiple text styles as the backdoor trigger. Furthermore, inspired by the ability of modern language models to distinguish between texts generated by different language models, this article proposes a paraphrase-based backdoor attack, which leverages the shared characteristics of sentences generated by the same paraphrase model as the backdoor trigger. Experiments have been conducted to demonstrate that both backdoor attack methods can be effective against NLP models. More importantly, compared with other feature space backdoor attacks, the poisoned samples generated by paraphrase-based backdoor attacks have improved semantic similarity.
引用
收藏
页码:1559 / 1568
页数:10
相关论文
共 30 条
  • [1] BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
    Chen, Xiaoyi
    Salem, Ahmed
    Chen, Dingfan
    Backes, Michael
    Ma, Shiqing
    Shen, Qingni
    Wu, Zhonghai
    Zhang, Yang
    37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 554 - 569
  • [2] NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
    Mei, Kai
    Li, Zheng
    Wang, Zhenting
    Zhang, Yang
    Ma, Shiqing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15551 - 15565
  • [3] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
    Yang, Wenkai
    Lin, Yankai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8365 - 8381
  • [4] Concealed Data Poisoning Attacks on NLP Models
    Wallace, Eric
    Zhao, Tony Z.
    Feng, Shi
    Singh, Sameer
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 139 - 150
  • [5] Fortifying NLP models against poisoning attacks: The power of personalized prediction architectures
    Ferdinan, Teddy
    Kocon, Jan
    INFORMATION FUSION, 2025, 144
  • [6] Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation
    Pan, Xudong
    Zhang, Mi
    Sheng, Beina
    Zhu, Jiaming
    Yang, Min
    PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 3611 - 3628
  • [7] Using Random Perturbations to Mitigate Adversarial Attacks on NLP Models
    Swenor, Abigail
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13142 - 13143
  • [8] Dynamic Backdoor Attacks Against Machine Learning Models
    Salem, Ahmed
    Wen, Rui
    Backes, Michael
    Ma, Shiqing
    Zhang, Yang
    2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 703 - 718
  • [9] Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models
    Sikdar, Sandipan
    Bhattacharya, Parantapa
    Heese, Kieran
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 865 - 878
  • [10] TARGET: Template-Transferable Backdoor Attack Against Prompt-Based NLP Models via GPT4
    Tan, Zihao
    Chen, Qingliang
    Huang, Yongjian
    Liang, Chen
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 398 - 411