Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

被引:1
|
作者
Kumar, Pranjal [1 ]
机构
[1] Lovely Profess Univ, Sch Comp Sci & Engn, Dept Intelligent Syst, Phagwara 144411, Punjab, India
关键词
Adversarial attacks; Artificial intelligence; Natural language processing; Machine learning; Neural networks; Large language models; ChatGPT; GPT; COMPUTER VISION; EXAMPLES;
D O I
10.1007/s13735-024-00334-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Fast Adversarial Attacks on Language Models In One GPU Minute
    Sadasivan, Vinu Sankar
    Saha, Shoumik
    Sriramanan, Gaurang
    Kattakinda, Priyatham
    Chegini, Atoosa
    Feizi, Soheil
    arXiv,
  • [32] Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms
    T. Ter-Hovhannisyan
    H. Aleksanyan
    K. Avetisyan
    Journal of Mathematical Sciences, 2024, 285 (2) : 210 - 220
  • [33] Robustness Evaluation of Cloud-Deployed Large Language Models against Chinese Adversarial Text Attacks
    Zhang, Yunting
    Ye, Lin
    Li, Baisong
    Zhang, Hongli
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 438 - 442
  • [34] LLMs4OL: Large Language Models for Ontology Learning
    Giglou, Hamed Babaei
    D'Souza, Jennifer
    Auer, Soeren
    SEMANTIC WEB, ISWC 2023, PART I, 2023, 14265 : 408 - 427
  • [35] Harnessing large language models (LLMs) for candidate gene prioritization and selection
    Toufiq, Mohammed
    Rinchai, Darawan
    Bettacchioli, Eleonore
    Kabeer, Basirudeen Syed Ahamed
    Khan, Taushif
    Subba, Bishesh
    White, Olivia
    Yurieva, Marina
    George, Joshy
    Jourde-Chiche, Noemie
    Chiche, Laurent
    Palucka, Karolina
    Chaussabel, Damien
    JOURNAL OF TRANSLATIONAL MEDICINE, 2023, 21 (01)
  • [36] Innovation and application of Large Language Models (LLMs) in dentistry - a scoping review
    Umer, Fahad
    Batool, Itrat
    Naved, Nighat
    BDJ OPEN, 2024, 10 (01)
  • [37] Large Language Models (LLMs) Enable Few-Shot Clustering
    Vijay, Viswanathan
    Kiril, Gashteovski
    Carolin, Lawrence
    Tongshuang, Wu
    Graham, Neubig
    NEC Technical Journal, 2024, 17 (02): : 80 - 90
  • [38] LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models
    Deng, Xiang
    Bashlovkina, Vasilisa
    Han, Feng
    Baumgartner, Simon
    Bendersky, Michael
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1014 - 1019
  • [39] Leveraging Large Language Models (LLMs) For Randomized Clinical Trial Summarization
    Mangla, Anjali
    Thangaraj, Phyllis
    Khera, Rohan
    CIRCULATION, 2024, 150
  • [40] Reducing the Energy Dissipation of Large Language Models (LLMs) with Approximate Memories
    Gao, Zhen
    Deng, Jie
    Reviriego, Pedro
    Liu, Shanshan
    Lombardi, Fabrizio
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,