Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

被引：1

作者：

Kumar, Pranjal ^{[1
]}

机构：

[1] Lovely Profess Univ, Sch Comp Sci & Engn, Dept Intelligent Syst, Phagwara 144411, Punjab, India

来源：

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL | 2024年 / 13卷 / 03期

关键词：

Adversarial attacks; Artificial intelligence; Natural language processing; Machine learning; Neural networks; Large language models; ChatGPT; GPT; COMPUTER VISION; EXAMPLES;

D O I：

10.1007/s13735-024-00334-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.

引用

页数：28

共 50 条

[1] Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Schwinn, Leo
Dobre, David
Guennemann, Stephan
Gidel, Gauthier
PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 103 - 117
[2] Large language models (LLMs): survey, technical frameworks, and future challenges
Kumar, Pranjal
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
[3] Adversarial Attacks on Large Language Models
Zou, Jing
Zhang, Shungeng
Qiu, Meikang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 85 - 96
[4] Adversarial Attacks and Defenses for Deployed AI Models
Gupta, Kishor Datta
Dasgupta, Dipankar
IT PROFESSIONAL, 2022, 24 (04) : 37 - 41
[5] Adversarial Attacks and Defenses for Deep Learning Models
Li M.
Jiang P.
Wang Q.
Shen C.
Li Q.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 909 - 926
[6] A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)
Patil, Rajvardhan
Gudivada, Venkat
APPLIED SCIENCES-BASEL, 2024, 14 (05):
[7] An Analysis of Adversarial Attacks and Defenses on Autonomous Driving Models
Deng, Yao
Zheng, Xi
Zhang, Tianyi
Chen, Chen
Lou, Guannan
Kim, Miryung
2020 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM 2020), 2020,
[8] On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses
Chhabra, Anshuman
Sekhari, Ashwin
Mohapatra, Prasant
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Securing DNN for smart vehicles: an overview of adversarial attacks, defenses, and frameworks
Almutairi S.
Barnawi A.
Journal of Engineering and Applied Science, 2023, 70 (01):
[10] Lower Energy Large Language Models (LLMs)
Lin, Hsiao-Ying
Voas, Jeffrey
COMPUTER, 2023, 56 (10) : 14 - 16

← 1 2 3 4 5 →