Locally Private Set-Valued Data Analyses: Distribution and Heavy Hitters Estimation

被引:6
作者
Wang, Shaowei [1 ]
Li, Yuntong [1 ]
Zhong, Yusen [1 ]
Chen, Kongyang [1 ]
Wang, Xianmin [1 ]
Zhou, Zhili [1 ]
Peng, Fei [1 ]
Qian, Yuqiu [2 ]
Du, Jiachun [2 ]
Yang, Wei [3 ]
机构
[1] Guangzhou Univ, Inst Artificial Intelligence & Blockchain, Guangzhou 511370, Guangdong, Peoples R China
[2] Tencent Games, Interact Entertainment Grp, Shenzhen 518054, Guangdong, Peoples R China
[3] Univ Sci & Technol China, Inst Adv Res, Hefei 230052, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Estimation; Wheels; Protocols; Servers; Differential privacy; Costs; Distributed databases; Distributed data aggregation; frequency estimation; heavy-hitter identification; local differential privacy;
D O I
10.1109/TMC.2023.3342056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many mobile applications, user-generated data are presented as set-valued data. To tackle potential privacy threats in analyzing these valuable data, local differential privacy has been attracting substantial attention. However, existing approaches only provide sub-optimal utility and are expensive in computation and communication for set-valued data distribution estimation and heavy-hitter identification. In this paper, we propose a utility-optimal and efficient set-valued data publication method (i.e., Wheel mechanism). On the user side, the computational complexity is only O(min{m log m, me(is an element of)}) and communication costs are O(is an element of+ log m) bits, where m is the number of items, d is the domain size and epsilon is the privacy budget, while existing approaches usually depend on O(d) or O(log d)(d >> m). Our theoretical analyses reveal the estimation errors have been reduced from the previously known O(m(2)d/n is an element of(2)) to the optimal rate O(md/n is an element of(2)). Additionally, for heavy-hitter identification, we present a variant of the Wheel mechanism as an efficient frequency oracle, entailing only O(root n) computational complexity. This heavy-hitter protocol achieves an identification bar of O(1/is an element of root m/n log d),, reducing by a factor of O(root m) relative to existing protocols. Extensive experiments demonstrate our methods are 3-100x faster than existing approaches and have optimized statistical efficiency.
引用
收藏
页码:8050 / 8065
页数:16
相关论文
共 48 条
[21]  
Goldman E., 2019, Santa Clara Univ. Legal Stud. Res. Paper
[22]  
Greenberg A., 2016, Wired
[23]  
He Y., 2009, PVLDB, P934, DOI DOI 10.14778/1687627.1687733
[24]   Quid-Pro-Quo-tocols: Strengthening Semi-Honest Protocols with Dual Execution [J].
Huang, Yan ;
Katz, Jonathan ;
Evans, David .
2012 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2012, :272-284
[25]  
Kairouz P, 2016, PR MACH LEARN RES, V48
[26]   Characterising Usage Patterns and Privacy Risks of a Home Security Camera Service [J].
Li, Jinyang ;
Li, Zhenyu ;
Tyson, Gareth ;
Xie, Gaogang .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2022, 21 (07) :2344-2357
[27]   PrivBasis: Frequent Itemset Mining with Differential Privacy [J].
Li, Ninghui ;
Qardaji, Wahbeh ;
Su, Dong ;
Cao, Jianneng .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11) :1340-1351
[28]  
Lin J. Niu, 2021, IEEE Trans. Mobile Comput., V20, P3
[29]  
Lindell Y, 2005, Encyclopedia of Data Warehousing and Mining, P1005
[30]  
Machanavajjhala A., 2006, P 22 INT C DAT ENG, P24, DOI DOI 10.1186/1471-2288-10-70