Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

被引:1
|
作者
Kumar, Pranjal [1 ]
机构
[1] Lovely Profess Univ, Sch Comp Sci & Engn, Dept Intelligent Syst, Phagwara 144411, Punjab, India
关键词
Adversarial attacks; Artificial intelligence; Natural language processing; Machine learning; Neural networks; Large language models; ChatGPT; GPT; COMPUTER VISION; EXAMPLES;
D O I
10.1007/s13735-024-00334-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models
    Lin, Zichao
    Guan, Shuyan
    Zhang, Wending
    Zhang, Huiyan
    Li, Yugang
    Zhang, Huaping
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [42] Reinforcement Learning With Large Language Models (LLMs) Interaction For Network Services
    Du, Hongyang
    Zhang, Ruichen
    Niyato, Dusit
    Kang, Jiawen
    Xiong, Zehui
    Kim, Dong In
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 799 - 803
  • [43] Enhancing Accessibility in Software Engineering Projects with Large Language Models (LLMs)
    Aljedaani, Wajdi
    Eler, Marcelo Medeiros
    Parthasarathy, P. D.
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 25 - 31
  • [44] Performance of large language models (LLMs) in providing prostate cancer information
    Alasker, Ahmed
    Alsalamah, Seham
    Alshathri, Nada
    Almansour, Nura
    Alsalamah, Faris
    Alghafees, Mohammad
    Alkhamees, Mohammad
    Alsaikhan, Bader
    BMC UROLOGY, 2024, 24 (01):
  • [45] Enhancing Accessibility in Software Engineering Projects with Large Language Models (LLMs)
    Aljedaani, Wajdi
    Eler, Marcelo Medeiros
    Parthasarathy, P. D.
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 25 - 31
  • [46] AGE-RELATED VALUE ORIENTATIONS IN LARGE LANGUAGE MODELS (LLMS)
    Zhang, Xin
    Ren, Yuanyi
    Song, Guojie
    INNOVATION IN AGING, 2024, 8 : 1010 - 1010
  • [47] Harnessing large language models (LLMs) for candidate gene prioritization and selection
    Mohammed Toufiq
    Darawan Rinchai
    Eleonore Bettacchioli
    Basirudeen Syed Ahamed Kabeer
    Taushif Khan
    Bishesh Subba
    Olivia White
    Marina Yurieva
    Joshy George
    Noemie Jourde-Chiche
    Laurent Chiche
    Karolina Palucka
    Damien Chaussabel
    Journal of Translational Medicine, 21
  • [48] Unmasking the Vulnerabilities of Deep Learning Models: A Multi-Dimensional Analysis of Adversarial Attacks and Defenses
    Juraev, Firuz
    Abuhamad, Mohammed
    Chan-Tin, Eric
    Thiruvathukal, George K.
    Abuhmed, Tamer
    2024 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2024, 2024,
  • [49] Integrating Large Language Models in Political Discourse Studies on Social Media: Challenges of Validating an LLMs-in-the-loop Pipeline
    Marino, Giada
    Giglietto, Fabio
    SOCIOLOGICA-INTERNATIONAL JOURNAL FOR SOCIOLOGICAL DEBATE, 2024, 18 (02): : 87 - 107
  • [50] Lion: Adversarial Distillation of Proprietary Large Language Models
    Jiang, Yuxin
    Chan, Chunkit
    Chen, Mingyang
    Wang, Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3134 - 3154