Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms

被引：0

作者：

T. Ter-Hovhannisyan ^{[1
]}

H. Aleksanyan ^{[1
]}

K. Avetisyan ^{[1
]}

机构：

[1] Russian-Armenian University,

[2] ISP RAS,undefined

来源：

Journal of Mathematical Sciences | 2024年 / 285卷 / 2期

关键词：

D O I：

10.1007/s10958-024-07427-z

中图分类号：

学科分类号：

摘要：

Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30–65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.

引用

页码：210 / 220

页数：10

共 50 条

[1] Adversarial Attacks on Large Language Models
Zou, Jing
Zhang, Shungeng
Qiu, Meikang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
[2] Fast Adversarial Attacks on Language Models In One GPU Minute
Sadasivan, Vinu Sankar
Saha, Shoumik
Sriramanan, Gaurang
Kattakinda, Priyatham
Chegini, Atoosa
Feizi, Soheil
arXiv,
[3] ChatGPT in threefold: As attacker, target, and evaluator in adversarial attacks
Zhang, Yunting
Ye, Lin
Tan, Kai
Tian, Zeshu
Li, Baisong
Zhang, Hongli
NEUROCOMPUTING, 2025, 621
[4] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Schwinn, Leo
Dobre, David
Guennemann, Stephan
Gidel, Gauthier
PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
[5] Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT
Liu, Bowen
Xiao, Boao
Jiang, Xutong
Cen, Siyuan
He, Xin
Dou, Wanchun
Security and Communication Networks, 2023, 2023
[6] Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models
Zhou, Huichi
Wang, Zhaoyang
Wang, Hongtao
Chen, Dongping
Mu, Wenhan
Zhang, Fangyuan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4902 - 4922
[7] Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges
Kumar, Pranjal
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
[8] Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey
Zhang, Wei Emma
Sheng, Quan Z.
Alhazmi, Ahoud
Li, Chenliang
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (03)
[9] TF-Attack: Transferable and fast adversarial attacks on large language models
Li, Zelin
Chen, Kehai
Liu, Lemao
Bai, Xuefeng
Yang, Mingming
Xiang, Yang
Zhang, Min
KNOWLEDGE-BASED SYSTEMS, 2025, 312
[10] A Survey on Adversarial Text Attacks on Deep Learning Models in Natural Language Processing
Deepan, S.
Torres-Cruz, Fred
Placido-Lerma, Ruben L.
Udhayakumar, R.
Anuradha, S.
Kapila, Dhiraj
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 1059 - 1067

← 1 2 3 4 5 →