Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

被引：1

作者：

Kumar, Pranjal ^{[1
]}

机构：

[1] Lovely Profess Univ, Sch Comp Sci & Engn, Dept Intelligent Syst, Phagwara 144411, Punjab, India

来源：

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL | 2024年 / 13卷 / 03期

关键词：

Adversarial attacks; Artificial intelligence; Natural language processing; Machine learning; Neural networks; Large language models; ChatGPT; GPT; COMPUTER VISION; EXAMPLES;

D O I：

10.1007/s13735-024-00334-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.

引用

页数：28

共 50 条

[41] Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models
Lin, Zichao
Guan, Shuyan
Zhang, Wending
Zhang, Huiyan
Li, Yugang
Zhang, Huaping
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
[42] Reinforcement Learning With Large Language Models (LLMs) Interaction For Network Services
Du, Hongyang
Zhang, Ruichen
Niyato, Dusit
Kang, Jiawen
Xiong, Zehui
Kim, Dong In
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 799 - 803
[43] Enhancing Accessibility in Software Engineering Projects with Large Language Models (LLMs)
Aljedaani, Wajdi
Eler, Marcelo Medeiros
Parthasarathy, P. D.
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 25 - 31
[44] Performance of large language models (LLMs) in providing prostate cancer information
Alasker, Ahmed
Alsalamah, Seham
Alshathri, Nada
Almansour, Nura
Alsalamah, Faris
Alghafees, Mohammad
Alkhamees, Mohammad
Alsaikhan, Bader
BMC UROLOGY, 2024, 24 (01):
[45] Enhancing Accessibility in Software Engineering Projects with Large Language Models (LLMs)
Aljedaani, Wajdi
Eler, Marcelo Medeiros
Parthasarathy, P. D.
PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 25 - 31
[46] AGE-RELATED VALUE ORIENTATIONS IN LARGE LANGUAGE MODELS (LLMS)
Zhang, Xin
Ren, Yuanyi
Song, Guojie
INNOVATION IN AGING, 2024, 8 : 1010 - 1010
[47] Harnessing large language models (LLMs) for candidate gene prioritization and selection
Mohammed Toufiq
Darawan Rinchai
Eleonore Bettacchioli
Basirudeen Syed Ahamed Kabeer
Taushif Khan
Bishesh Subba
Olivia White
Marina Yurieva
Joshy George
Noemie Jourde-Chiche
Laurent Chiche
Karolina Palucka
Damien Chaussabel
Journal of Translational Medicine, 21
[48] Unmasking the Vulnerabilities of Deep Learning Models: A Multi-Dimensional Analysis of Adversarial Attacks and Defenses
Juraev, Firuz
Abuhamad, Mohammed
Chan-Tin, Eric
Thiruvathukal, George K.
Abuhmed, Tamer
2024 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2024, 2024,
[49] Integrating Large Language Models in Political Discourse Studies on Social Media: Challenges of Validating an LLMs-in-the-loop Pipeline
Marino, Giada
Giglietto, Fabio
SOCIOLOGICA-INTERNATIONAL JOURNAL FOR SOCIOLOGICAL DEBATE, 2024, 18 (02): : 87 - 107
[50] Lion: Adversarial Distillation of Proprietary Large Language Models
Jiang, Yuxin
Chan, Chunkit
Chen, Mingyang
Wang, Wei
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3134 - 3154

← 1 2 3 4 5 →