Analysis of Privacy Leakage in Federated Large Language Models

被引：0

作者：

Vu, Minh N. ^{[1
]}

Nguyen, Truc ^{[1
]}

Jeter, Tre' R. ^{[1
]}

Thai, My T. ^{[1
]}

机构：

[1] Univ Florida, Gainesville, FL 32611 USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

基金：

美国国家科学基金会;

关键词：

MEMBERSHIP INFERENCE ATTACKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.

引用

页数：23

共 50 条

[21] FedsLLM: Federated Split Learning for Large Language Models over Communication Networks
Zhao, Kai
Yang, Zhaohui
Huang, Chongwen
Chen, Xiaoming
Zhang, Zhaoyang
2024 INTERNATIONAL CONFERENCE ON UBIQUITOUS COMMUNICATION, UCOM 2024, 2024, : 438 - 443
[22] LLM-PBE: Assessing Data Privacy in Large Language Models
Li, Qinbin
Hong, Junyuan
Xie, Chulin
Tan, Jeffrey
Xin, Rachel
Hou, Junyi
Yin, Xavier
Wang, Zhun
Hendrycks, Dan
Wang, Zhangyang
Li, Bo
He, Bingsheng
Song, Dawn
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3201 - 3214
[23] Feasibility and Prospect of Privacy-preserving Large Language Models in Radiology
Cai, Wenli
RADIOLOGY, 2023, 309 (01)
[24] Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage
Li, Zhuohang
Zhang, Jiaxin
Liu, Luyang
Liu, Jian
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10122 - 10132
[25] Privacy Leakage from Logits Attack and Its Defense in Federated Distillation
Xiao, Danyang
Yang, Diying
Li, Jialun
Chen, Xu
Wu, Weigang
2024 54TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, DSN 2024, 2024, : 169 - 182
[26] Federated learning for privacy-preserving depression detection with multilingual language models in social media posts
Khalil, Samar Samir
Tawfik, Noha S.
Spruit, Marco
PATTERNS, 2024, 5 (07):
[27] Mitigating Privacy Seesaw in Large Language Models: Augmented Privacy Neuron Editing via Activation Patching
Wu, Xinwei
Dong, Weilong
Xu, Shaoyang
Xiong, Deyi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5319 - 5332
[28] On Inter-Dataset Code Duplication and Data Leakage in Large Language Models
Lopez, Jose Antonio Hernandez
Chen, Boqi
Saad, Mootez
Sharma, Tushar
Varro, Daniel
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (01) : 192 - 205
[29] Beyond Class-Level Privacy Leakage: Breaking Record-Level Privacy in Federated Learning
Yuan, Xiaoyong
Ma, Xiyao
Zhang, Lan
Fang, Yuguang
Wu, Dapeng
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (04) : 2555 - 2565
[30] Privacy-preserving large language models for structured medical information retrieval
Wiest, Isabella Catharina
Ferber, Dyke
Zhu, Jiefu
van Treeck, Marko
Meyer, Sonja K.
Juglan, Radhika
Carrero, Zunamys I.
Paech, Daniel
Kleesiek, Jens
Ebert, Matthias P.
Truhn, Daniel
Kather, Jakob Nikolas
NPJ DIGITAL MEDICINE, 2024, 7 (01):

← 1 2 3 4 5 →