Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor Attacks

被引：1

作者：

Li, Xiangjun ^{[1
]}

Lu, Xin ^{[1
]}

Li, Peixuan ^{[1
]}

机构：

[1] Nanchang Univ, Sch Software, Nanchang 330000, Peoples R China

来源：

IEEE TRANSACTIONS ON RELIABILITY | 2024年 / 73卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Training; Syntactics; Task analysis; Feature extraction; Semantics; Data models; Trojan horses; Deep neural networks (DNNs); natural language processing (NLP); backdoor attacks; style transfer; paraphrase;

D O I：

10.1109/TR.2024.3375526

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

At present, deep neural networks are at risk from backdoor attacks, but natural language processing (NLP) lacks sufficient research on backdoor attacks. To improve the invisibility of backdoor attacks, some innovative textual backdoor attack methods utilize modern language models to generate poisoned text with backdoor triggers, which are called feature space backdoor attacks. However, this article find that texts generated by the same language model without backdoor triggers also have a high probability of activating the backdoors they injected. Therefore, this article proposes a multistyle transfer-based backdoor attack that uses multiple text styles as the backdoor trigger. Furthermore, inspired by the ability of modern language models to distinguish between texts generated by different language models, this article proposes a paraphrase-based backdoor attack, which leverages the shared characteristics of sentences generated by the same paraphrase model as the backdoor trigger. Experiments have been conducted to demonstrate that both backdoor attack methods can be effective against NLP models. More importantly, compared with other feature space backdoor attacks, the poisoned samples generated by paraphrase-based backdoor attacks have improved semantic similarity.

引用

页码：1559 / 1568

页数：10

共 30 条

[1] BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
Chen, Xiaoyi
Salem, Ahmed
Chen, Dingfan
Backes, Michael
Ma, Shiqing
Shen, Qingni
Wu, Zhonghai
Zhang, Yang
37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 554 - 569
[2] NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
Mei, Kai
Li, Zheng
Wang, Zhenting
Zhang, Yang
Ma, Shiqing
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15551 - 15565
[3] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
Yang, Wenkai
Lin, Yankai
Li, Peng
Zhou, Jie
Sun, Xu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8365 - 8381
[4] Concealed Data Poisoning Attacks on NLP Models
Wallace, Eric
Zhao, Tony Z.
Feng, Shi
Singh, Sameer
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 139 - 150
[5] Fortifying NLP models against poisoning attacks: The power of personalized prediction architectures
Ferdinan, Teddy
Kocon, Jan
INFORMATION FUSION, 2025, 144
[6] Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation
Pan, Xudong
Zhang, Mi
Sheng, Beina
Zhu, Jiaming
Yang, Min
PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 3611 - 3628
[7] Using Random Perturbations to Mitigate Adversarial Attacks on NLP Models
Swenor, Abigail
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13142 - 13143
[8] Dynamic Backdoor Attacks Against Machine Learning Models
Salem, Ahmed
Wen, Rui
Backes, Michael
Ma, Shiqing
Zhang, Yang
2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 703 - 718
[9] Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models
Sikdar, Sandipan
Bhattacharya, Parantapa
Heese, Kieran
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 865 - 878
[10] TARGET: Template-Transferable Backdoor Attack Against Prompt-Based NLP Models via GPT4
Tan, Zihao
Chen, Qingliang
Huang, Yongjian
Liang, Chen
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 398 - 411

← 1 2 3 →