Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms

被引：0

作者：

T. Ter-Hovhannisyan ^{[1
]}

H. Aleksanyan ^{[1
]}

K. Avetisyan ^{[1
]}

机构：

[1] Russian-Armenian University,

[2] ISP RAS,undefined

来源：

Journal of Mathematical Sciences | 2024年 / 285卷 / 2期

关键词：

D O I：

10.1007/s10958-024-07427-z

中图分类号：

学科分类号：

摘要：

Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30–65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.

引用

页码：210 / 220

页数：10

共 50 条

[21] On the Robustness of Semantic Segmentation Models to Adversarial Attacks
Arnab, Anurag
Miksik, Ondrej
Torr, Philip H. S.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (12) : 3040 - 3053
[22] Fighting Adversarial Attacks on Online Abusive Language Moderation
Rodriguez, Nestor
Rojas-Galeano, Sergio
APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2018, PT I, 2018, 915 : 480 - 493
[23] Universal Adversarial Attacks On Spoken Language Assessment Systems
Raina, Vyas
Gales, Mark J. F.
Knill, Kate M.
INTERSPEECH 2020, 2020, : 3855 - 3859
[24] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Yin, Ziyi
Ye, Muchao
Zhang, Tianrong
Du, Tianyu
Zhu, Jinguo
Liu, Han
Chen, Jinghui
Wang, Ting
Ma, Fenglong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[25] CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning
Yao, Kaich un
Wang, Hao
Qin, Chuan
Zh, Hengshu
Wu, Yanjun
Zhang, Libo
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
[26] Robustness Evaluation of Cloud-Deployed Large Language Models against Chinese Adversarial Text Attacks
Zhang, Yunting
Ye, Lin
Li, Baisong
Zhang, Hongli
2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 438 - 442
[27] LARGE LANGUAGE MODELS (LLMS) AND CHATGPT FOR BIOMEDICINE
Arighi, Cecilia
Brenner, Steven
Lu, Zhiyong
BIOCOMPUTING 2024, PSB 2024, 2024, : 641 - 644
[28] FUTURE OF THE LANGUAGE MODELS IN HEALTHCARE: THE ROLE OF CHATGPT
Tustumi, Francisco
Andreollo, Nelson Adami
de Aguilar-Nascimento, Jose Eduardo
ABCD-ARQUIVOS BRASILEIROS DE CIRURGIA DIGESTIVA-BRAZILIAN ARCHIVES OF DIGESTIVE SURGERY, 2023, 36 (01):
[29] THE LANGUAGE MODELS IN HEALTHCARE AND THE ROLE OF CHATGPT: COMMENTS
Kleebayoon, Amnuay
Wiwanitkit, Viroj
ABCD-ARQUIVOS BRASILEIROS DE CIRURGIA DIGESTIVA-BRAZILIAN ARCHIVES OF DIGESTIVE SURGERY, 2023, 36
[30] Evolution of ChatGPT and Different Language Models: A Review
Priyanka
Kumari, Ritika
Bansal, Poonam
Dev, Amita
SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 87 - 97

← 1 2 3 4 5 →