Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms

被引:0
|
作者
T. Ter-Hovhannisyan [1 ]
H. Aleksanyan [1 ]
K. Avetisyan [1 ]
机构
[1] Russian-Armenian University,
[2] ISP RAS,undefined
关键词
D O I
10.1007/s10958-024-07427-z
中图分类号
学科分类号
摘要
Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30–65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.
引用
收藏
页码:210 / 220
页数:10
相关论文
共 50 条
  • [21] On the Robustness of Semantic Segmentation Models to Adversarial Attacks
    Arnab, Anurag
    Miksik, Ondrej
    Torr, Philip H. S.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (12) : 3040 - 3053
  • [22] Fighting Adversarial Attacks on Online Abusive Language Moderation
    Rodriguez, Nestor
    Rojas-Galeano, Sergio
    APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2018, PT I, 2018, 915 : 480 - 493
  • [23] Universal Adversarial Attacks On Spoken Language Assessment Systems
    Raina, Vyas
    Gales, Mark J. F.
    Knill, Kate M.
    INTERSPEECH 2020, 2020, : 3855 - 3859
  • [24] VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
    Yin, Ziyi
    Ye, Muchao
    Zhang, Tianrong
    Du, Tianyu
    Zhu, Jinguo
    Liu, Han
    Chen, Jinghui
    Wang, Ting
    Ma, Fenglong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning
    Yao, Kaich un
    Wang, Hao
    Qin, Chuan
    Zh, Hengshu
    Wu, Yanjun
    Zhang, Libo
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
  • [26] Robustness Evaluation of Cloud-Deployed Large Language Models against Chinese Adversarial Text Attacks
    Zhang, Yunting
    Ye, Lin
    Li, Baisong
    Zhang, Hongli
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 438 - 442
  • [27] LARGE LANGUAGE MODELS (LLMS) AND CHATGPT FOR BIOMEDICINE
    Arighi, Cecilia
    Brenner, Steven
    Lu, Zhiyong
    BIOCOMPUTING 2024, PSB 2024, 2024, : 641 - 644
  • [28] FUTURE OF THE LANGUAGE MODELS IN HEALTHCARE: THE ROLE OF CHATGPT
    Tustumi, Francisco
    Andreollo, Nelson Adami
    de Aguilar-Nascimento, Jose Eduardo
    ABCD-ARQUIVOS BRASILEIROS DE CIRURGIA DIGESTIVA-BRAZILIAN ARCHIVES OF DIGESTIVE SURGERY, 2023, 36 (01):
  • [29] THE LANGUAGE MODELS IN HEALTHCARE AND THE ROLE OF CHATGPT: COMMENTS
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    ABCD-ARQUIVOS BRASILEIROS DE CIRURGIA DIGESTIVA-BRAZILIAN ARCHIVES OF DIGESTIVE SURGERY, 2023, 36
  • [30] Evolution of ChatGPT and Different Language Models: A Review
    Priyanka
    Kumari, Ritika
    Bansal, Poonam
    Dev, Amita
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 87 - 97