Analysis of Privacy Leakage in Federated Large Language Models

被引:0
|
作者
Vu, Minh N. [1 ]
Nguyen, Truc [1 ]
Jeter, Tre' R. [1 ]
Thai, My T. [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
基金
美国国家科学基金会;
关键词
MEMBERSHIP INFERENCE ATTACKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Beyond Individual Concerns: Multi-user Privacy in Large Language Models
    Zhan, Xiao
    Seymour, William
    Such, Jose
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [32] Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
    Zarski, Tymon Leslaw
    Janicki, Artur
    INFORMATION, 2025, 16 (01)
  • [33] Privacy preserving strategies for electronic health records in the era of large language models
    Jonnagaddala, Jitendra
    Wong, Zoie Shui-Yee
    NPJ DIGITAL MEDICINE, 2025, 8 (01):
  • [34] Information-Theoretic Bounds on the Generalization Error and Privacy Leakage in Federated Learning
    Yagli, Semih
    Dytso, Alex
    Poor, H. Vincent
    PROCEEDINGS OF THE 21ST IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC2020), 2020,
  • [35] Trend Analysis Through Large Language Models
    Alzapiedi, Lucas
    Bihl, Trevor
    IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE, NAECON 2024, 2024, : 370 - 374
  • [36] Automated Topic Analysis with Large Language Models
    Kirilenko, Andrei
    Stepchenkova, Svetlana
    INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 2024, ENTER 2024, 2024, : 29 - 34
  • [37] Multimodal large language models for bioimage analysis
    Zhang, Shanghang
    Dai, Gaole
    Huang, Tiejun
    Chen, Jianxu
    NATURE METHODS, 2024, 21 (08) : 1390 - 1393
  • [38] DP-GSGLD: A Bayesian optimizer inspired by differential privacy defending against privacy leakage in federated learning
    Yang, Chengyi
    Jia, Kun
    Kong, Deli
    Qi, Jiayin
    Zhou, Aimin
    COMPUTERS & SECURITY, 2024, 142
  • [39] Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization
    Che, Tianshi
    Liu, Ji
    Zhou, Yang
    Ren, Jiaxiang
    Zhou, Jiwen
    Sheng, Victor S.
    Dai, Huaiyu
    Dou, Dejing
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7871 - 7888
  • [40] Selective privacy-preserving framework for large language models fine-tuning
    Wang, Teng
    Zhai, Lindong
    Yang, Tengfei
    Luo, Zhucheng
    Liu, Shuanggen
    INFORMATION SCIENCES, 2024, 678