Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms

被引：0

作者：

T. Ter-Hovhannisyan ^{[1
]}

H. Aleksanyan ^{[1
]}

K. Avetisyan ^{[1
]}

机构：

[1] Russian-Armenian University,

[2] ISP RAS,undefined

来源：

Journal of Mathematical Sciences | 2024年 / 285卷 / 2期

关键词：

D O I：

10.1007/s10958-024-07427-z

中图分类号：

学科分类号：

摘要：

Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30–65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.

引用

页码：210 / 220

页数：10

共 50 条

[41] Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models
Liu, Fan
Liu, Hao
Jiang, Wenzhao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[42] Toward Federated Learning Models Resistant to Adversarial Attacks
Hu, Fei
Zhou, Wuneng
Liao, Kaili
Li, Hongliang
Tong, Dongbing
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (19) : 16917 - 16930
[43] Semantically Stealthy Adversarial Attacks against Segmentation Models
Chen, Zhenhua
Wang, Chuhua
Crandall, David
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2846 - 2855
[44] HEADLESS HORSEMAN: ADVERSARIAL ATTACKS ON TRANSFER LEARNING MODELS
Abdelkader, Ahmed
Curry, Michael J.
Fowl, Liam
Goldstein, Tom
Schwarzschild, Avi
Shu, Manli
Studer, Christoph
Zhu, Chen
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3087 - 3091
[45] Blind Adversarial Training: Towards Comprehensively Robust Models Against Blind Adversarial Attacks
Xie, Haidong
Xiang, Xueshuang
Dong, Bin
Liu, Naijin
ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 15 - 26
[46] Adversarial Defense on Harmony: Reverse Attack for Robust AI Models Against Adversarial Attacks
Kim, Yebon
Jung, Jinhyo
Kim, Hyunjun
So, Hwisoo
Ko, Yohan
Shrivastava, Aviral
Lee, Kyoungwoo
Hwang, Uiwon
IEEE ACCESS, 2024, 12 : 176485 - 176497
[47] Generate qualified adversarial attacks and foster enhanced models based on generative adversarial networks
He, Junpeng
Luo, Lei
Xiao, Kun
Fang, Xiyu
Li, Yun
INTELLIGENT DATA ANALYSIS, 2022, 26 (05) : 1359 - 1377
[48] ChatGPT and large language models in academia: opportunities and challenges
Jesse G. Meyer
Ryan J. Urbanowicz
Patrick C. N. Martin
Karen O’Connor
Ruowang Li
Pei-Chen Peng
Tiffani J. Bright
Nicholas Tatonetti
Kyoung Jae Won
Graciela Gonzalez-Hernandez
Jason H. Moore
BioData Mining, 16
[49] Improving Neural Network Models for Natural Language Processing in Russian with Synonyms
Galinsky, Ruslan
Alekseev, Anton
Nikolenko, Sergey I.
PROCEEDINGS OF THE 2016 IEEE ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE CONFERENCE (AINL FRUCT 2016), 2016, : 45 - 51
[50] ChatGPT and large language models in academia: opportunities and challenges
Meyer, Jesse G.
Urbanowicz, Ryan J.
Martin, Patrick C. N.
O'Connor, Karen
Li, Ruowang
Peng, Pei-Chen
Bright, Tiffani J.
Tatonetti, Nicholas
Won, Kyoung Jae
Gonzalez-Hernandez, Graciela
Moore, Jason H.
BIODATA MINING, 2023, 16 (01)

← 1 2 3 4 5 →