Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms

被引:0
|
作者
T. Ter-Hovhannisyan [1 ]
H. Aleksanyan [1 ]
K. Avetisyan [1 ]
机构
[1] Russian-Armenian University,
[2] ISP RAS,undefined
关键词
D O I
10.1007/s10958-024-07427-z
中图分类号
学科分类号
摘要
Adversarial attacks on text have gained significant attention in recent years due to their potential to undermine the reliability of NLP models. We present novel black-box character- and word-level adversarial example generation approaches applicable to BERT-based models. The character-level approach is based on the idea of adding natural typos into a word according to its WordPiece tokenization. As for word-level approaches, we present three techniques that make use of synonymous substitute words created by ChatGPT and post-corrected to be in the appropriate grammatical form for the given context. Additionally, we try to minimize the perturbation rate taking into account the damage that each perturbation does to the model. By combining character-level approaches, word-level approaches, and the perturbation rate minimization technique, we achieve a state of the art attack rate. Our best approach works 30–65% faster than the previously best method, Tampers, and has a comparable perturbation rate. At the same time, proposed perturbations retain the semantic similarity between the original and adversarial examples and achieve a relatively low value of Levenshtein distance.
引用
收藏
页码:210 / 220
页数:10
相关论文
共 50 条
  • [1] Adversarial Attacks on Large Language Models
    Zou, Jing
    Zhang, Shungeng
    Qiu, Meikang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
  • [2] Fast Adversarial Attacks on Language Models In One GPU Minute
    Sadasivan, Vinu Sankar
    Saha, Shoumik
    Sriramanan, Gaurang
    Kattakinda, Priyatham
    Chegini, Atoosa
    Feizi, Soheil
    arXiv,
  • [3] ChatGPT in threefold: As attacker, target, and evaluator in adversarial attacks
    Zhang, Yunting
    Ye, Lin
    Tan, Kai
    Tian, Zeshu
    Li, Baisong
    Zhang, Hongli
    NEUROCOMPUTING, 2025, 621
  • [4] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
    Schwinn, Leo
    Dobre, David
    Guennemann, Stephan
    Gidel, Gauthier
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
  • [5] Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT
    Liu, Bowen
    Xiao, Boao
    Jiang, Xutong
    Cen, Siyuan
    He, Xin
    Dou, Wanchun
    Security and Communication Networks, 2023, 2023
  • [6] Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models
    Zhou, Huichi
    Wang, Zhaoyang
    Wang, Hongtao
    Chen, Dongping
    Mu, Wenhan
    Zhang, Fangyuan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4902 - 4922
  • [7] Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges
    Kumar, Pranjal
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
  • [8] Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey
    Zhang, Wei Emma
    Sheng, Quan Z.
    Alhazmi, Ahoud
    Li, Chenliang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (03)
  • [9] TF-Attack: Transferable and fast adversarial attacks on large language models
    Li, Zelin
    Chen, Kehai
    Liu, Lemao
    Bai, Xuefeng
    Yang, Mingming
    Xiang, Yang
    Zhang, Min
    KNOWLEDGE-BASED SYSTEMS, 2025, 312
  • [10] A Survey on Adversarial Text Attacks on Deep Learning Models in Natural Language Processing
    Deepan, S.
    Torres-Cruz, Fred
    Placido-Lerma, Ruben L.
    Udhayakumar, R.
    Anuradha, S.
    Kapila, Dhiraj
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 1059 - 1067