Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

被引：1

作者：

Kumar, Pranjal ^{[1
]}

机构：

[1] Lovely Profess Univ, Sch Comp Sci & Engn, Dept Intelligent Syst, Phagwara 144411, Punjab, India

来源：

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL | 2024年 / 13卷 / 03期

关键词：

Adversarial attacks; Artificial intelligence; Natural language processing; Machine learning; Neural networks; Large language models; ChatGPT; GPT; COMPUTER VISION; EXAMPLES;

D O I：

10.1007/s13735-024-00334-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.

引用

页数：28

共 50 条

[31] Fast Adversarial Attacks on Language Models In One GPU Minute
Sadasivan, Vinu Sankar
Saha, Shoumik
Sriramanan, Gaurang
Kattakinda, Priyatham
Chegini, Atoosa
Feizi, Soheil
arXiv,
[32] Adversarial Attacks on Language Models: WordPiece Filtration and ChatGPT Synonyms
T. Ter-Hovhannisyan
H. Aleksanyan
K. Avetisyan
Journal of Mathematical Sciences, 2024, 285 (2) : 210 - 220
[33] Robustness Evaluation of Cloud-Deployed Large Language Models against Chinese Adversarial Text Attacks
Zhang, Yunting
Ye, Lin
Li, Baisong
Zhang, Hongli
2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 438 - 442
[34] LLMs4OL: Large Language Models for Ontology Learning
Giglou, Hamed Babaei
D'Souza, Jennifer
Auer, Soeren
SEMANTIC WEB, ISWC 2023, PART I, 2023, 14265 : 408 - 427
[35] Harnessing large language models (LLMs) for candidate gene prioritization and selection
Toufiq, Mohammed
Rinchai, Darawan
Bettacchioli, Eleonore
Kabeer, Basirudeen Syed Ahamed
Khan, Taushif
Subba, Bishesh
White, Olivia
Yurieva, Marina
George, Joshy
Jourde-Chiche, Noemie
Chiche, Laurent
Palucka, Karolina
Chaussabel, Damien
JOURNAL OF TRANSLATIONAL MEDICINE, 2023, 21 (01)
[36] Innovation and application of Large Language Models (LLMs) in dentistry - a scoping review
Umer, Fahad
Batool, Itrat
Naved, Nighat
BDJ OPEN, 2024, 10 (01)
[37] Large Language Models (LLMs) Enable Few-Shot Clustering
Vijay, Viswanathan
Kiril, Gashteovski
Carolin, Lawrence
Tongshuang, Wu
Graham, Neubig
NEC Technical Journal, 2024, 17 (02): : 80 - 90
[38] LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models
Deng, Xiang
Bashlovkina, Vasilisa
Han, Feng
Baumgartner, Simon
Bendersky, Michael
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1014 - 1019
[39] Leveraging Large Language Models (LLMs) For Randomized Clinical Trial Summarization
Mangla, Anjali
Thangaraj, Phyllis
Khera, Rohan
CIRCULATION, 2024, 150
[40] Reducing the Energy Dissipation of Large Language Models (LLMs) with Approximate Memories
Gao, Zhen
Deng, Jie
Reviriego, Pedro
Liu, Shanshan
Lombardi, Fabrizio
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,

← 1 2 3 4 5 →