Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

被引:1
|
作者
Kumar, Pranjal [1 ]
机构
[1] Lovely Profess Univ, Sch Comp Sci & Engn, Dept Intelligent Syst, Phagwara 144411, Punjab, India
关键词
Adversarial attacks; Artificial intelligence; Natural language processing; Machine learning; Neural networks; Large language models; ChatGPT; GPT; COMPUTER VISION; EXAMPLES;
D O I
10.1007/s13735-024-00334-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
    Schwinn, Leo
    Dobre, David
    Guennemann, Stephan
    Gidel, Gauthier
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
  • [2] Large language models (LLMs): survey, technical frameworks, and future challenges
    Kumar, Pranjal
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [3] Adversarial Attacks on Large Language Models
    Zou, Jing
    Zhang, Shungeng
    Qiu, Meikang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
  • [4] Adversarial Attacks and Defenses for Deployed AI Models
    Gupta, Kishor Datta
    Dasgupta, Dipankar
    IT PROFESSIONAL, 2022, 24 (04) : 37 - 41
  • [5] Adversarial Attacks and Defenses for Deep Learning Models
    Li M.
    Jiang P.
    Wang Q.
    Shen C.
    Li Q.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 909 - 926
  • [6] A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)
    Patil, Rajvardhan
    Gudivada, Venkat
    APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [7] An Analysis of Adversarial Attacks and Defenses on Autonomous Driving Models
    Deng, Yao
    Zheng, Xi
    Zhang, Tianyi
    Chen, Chen
    Lou, Guannan
    Kim, Miryung
    2020 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM 2020), 2020,
  • [8] On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses
    Chhabra, Anshuman
    Sekhari, Ashwin
    Mohapatra, Prasant
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Securing DNN for smart vehicles: an overview of adversarial attacks, defenses, and frameworks
    Almutairi S.
    Barnawi A.
    Journal of Engineering and Applied Science, 2023, 70 (01):
  • [10] Lower Energy Large Language Models (LLMs)
    Lin, Hsiao-Ying
    Voas, Jeffrey
    COMPUTER, 2023, 56 (10) : 14 - 16