FDP-LDA: Inherent Privacy Amplification of Collapsed Gibbs Sampling via Group Subsampling

被引：0

作者：

Huang, Tao ^{[1
,2
]}

Chen, Hong ^{[1
,2
]}

Zhao, Suyun ^{[1
,2
]}

机构：

[1] Renmin Univ China, Key Lab Data Engn & Knowledge Engn, Minist Educ, Beijing, Peoples R China

[2] Renmin Univ China, Sch Informat, Beijing, Peoples R China

来源：

WEB AND BIG DATA, PT III, APWEB-WAIM 2022 | 2023年 / 13423卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Differential privacy; Latent dirichlet allocation; Collapsed gibbs sampling;

D O I：

10.1007/978-3-031-25201-3_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Latent Dirichlet allocation (LDA) is a widely used fundamental tool for text analysis. Collapsed Gibbs sampling (CGS), as a widely adopted algorithm for learning the parameters of LDA, has the risk of privacy leakage. In this paper, we study the inherent privacy of CGS which is exploited to preserve the privacy for latent topic updates. We propose a method, called group subsampling, and a novel centralized privacy-preserving algorithm, called Fast-Differentially-Private LDA (FDP-LDA) to amplify the inherent privacy and improve the efficiency of traditional differentially private CGS. Theoretically, the general upper bound of the amplified inherent privacy loss in each iteration of FDP-LDA is verified mathematically. To our best knowledge, this is the first work that analyzes the inherent privacy amplification of differentially private CGS. Experimentally, results on real-world datasets validate the improved performances of FDP-LDA.

引用

页码：292 / 300

页数：9

共 21 条

[1] [Anonymous], 2018, Advances in Neural Information Processing Systems
[2] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[3] Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds
Bun, Mark
Steinke, Thomas
[J]. THEORY OF CRYPTOGRAPHY, TCC 2016-B, PT I, 2016, 9985 : 635 - 658
[4] Carlo C.M., 2004, Lect Notes EEB, V581, P3
[5] The Algorithmic Foundations of Differential Privacy
Dwork, Cynthia
Roth, Aaron
[J]. FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2013, 9 (3-4): : 211 - 406
[6] Foulds J, 2016, Arxiv, DOI arXiv:1603.07294
[7] Set-Based Adaptive Distributed Differential Evolution for Anonymity-Driven Database Fragmentation
Ge, Yong-Feng
Cao, Jinli
Wang, Hua
Chen, Zhenxiang
Zhang, Yanchun
[J]. DATA SCIENCE AND ENGINEERING, 2021, 6 (04) : 380 - 391
[8] Bi-Labeled LDA: Inferring Interest Tags for Non-famous Users in Social Network
He, Jun
Liu, Hongyan
Zheng, Yiqing
Tang, Shu
He, Wei
Du, Xiaoyong
[J]. DATA SCIENCE AND ENGINEERING, 2020, 5 (01) : 27 - 47
[9] Sub-Gibbs Sampling: a New Strategy for Inferring LDA
Hu, Chuan
Cao, Huiping
Gong, Qixu
[J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 907 - 912
[10] Reducing the Sampling Complexity of Topic Models
Li, Aaron Q.
Ahmed, Amr
Ravi, Sujith
Smola, Alexander J.
[J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 891 - 900

← 1 2 3 →