FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling

被引:0
作者
Huang, Tao [1 ,2 ]
Chen, Hong [1 ,2 ]
Zhao, Suyun [1 ,2 ]
机构
[1] Renmin Univ China, Key Lab Data Engn & Knowledge Engn, Minist Educ, Beijing, Peoples R China
[2] Renmin Univ China, Sch Informat, Beijing, Peoples R China
来源
WEB AND BIG DATA, PT III, APWEB-WAIM 2022 | 2023年 / 13423卷
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Differential privacy; Latent dirichlet allocation; Collapsed gibbs sampling;
D O I
10.1007/978-3-031-25201-3_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Dirichlet allocation (LDA) is a widely used fundamental tool for text analysis. Collapsed Gibbs sampling (CGS), as a widely adopted algorithm for learning the parameters of LDA, has the risk of privacy leakage. In this paper, we study the inherent privacy of CGS which is exploited to preserve the privacy for latent topic updates. We propose a method, called group subsampling, and a novel centralized privacy-preserving algorithm, called Fast-Differentially-Private LDA (FDP-LDA) to amplify the inherent privacy and improve the efficiency of traditional differentially private CGS. Theoretically, the general upper bound of the amplified inherent privacy loss in each iteration of FDP-LDA is verified mathematically. To our best knowledge, this is the first work that analyzes the inherent privacy amplification of differentially private CGS. Experimentally, results on real-world datasets validate the improved performances of FDP-LDA.
引用
收藏
页码:292 / 300
页数:9
相关论文
共 21 条
  • [1] [Anonymous], 2018, Advances in Neural Information Processing Systems
  • [2] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [3] Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds
    Bun, Mark
    Steinke, Thomas
    [J]. THEORY OF CRYPTOGRAPHY, TCC 2016-B, PT I, 2016, 9985 : 635 - 658
  • [4] Carlo C.M., 2004, Lect Notes EEB, V581, P3
  • [5] The Algorithmic Foundations of Differential Privacy
    Dwork, Cynthia
    Roth, Aaron
    [J]. FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2013, 9 (3-4): : 211 - 406
  • [6] Foulds J, 2016, Arxiv, DOI arXiv:1603.07294
  • [7] Set-Based Adaptive Distributed Differential Evolution for Anonymity-Driven Database Fragmentation
    Ge, Yong-Feng
    Cao, Jinli
    Wang, Hua
    Chen, Zhenxiang
    Zhang, Yanchun
    [J]. DATA SCIENCE AND ENGINEERING, 2021, 6 (04) : 380 - 391
  • [8] Bi-Labeled LDA: Inferring Interest Tags for Non-famous Users in Social Network
    He, Jun
    Liu, Hongyan
    Zheng, Yiqing
    Tang, Shu
    He, Wei
    Du, Xiaoyong
    [J]. DATA SCIENCE AND ENGINEERING, 2020, 5 (01) : 27 - 47
  • [9] Sub-Gibbs Sampling: a New Strategy for Inferring LDA
    Hu, Chuan
    Cao, Huiping
    Gong, Qixu
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 907 - 912
  • [10] Reducing the Sampling Complexity of Topic Models
    Li, Aaron Q.
    Ahmed, Amr
    Ravi, Sujith
    Smola, Alexander J.
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 891 - 900