Collaborative Sampling for Partial Multi-Dimensional Value Collection Under Local Differential Privacy

被引:3
作者
Qian, Qiuyu [1 ,2 ]
Ye, Qingqing [1 ]
Hu, Haibo [1 ]
Huang, Kai [3 ]
Chan, Tom Tak-Lam [2 ]
Li, Jin [4 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[2] Ctr Adv Reliabil & Safety CAiRS, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[4] Guangzhou Univ, Inst Artificial Intelligence & Blockchain, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Local differential privacy; collaborative sam-pling; privacy-preserving data collection; multi-dimensional data;
D O I
10.1109/TIFS.2023.3289007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In big data era, companies and organizations are keen to collect data from users and analyse their behaviour patterns to make decisions or predictions for profits. However, it undermines users' privacy because the collected data can be quite sensitive and easy to leak. To address privacy problems, local differential privacy (LDP) has been proposed for untrusted data collectors to obtain statistical information without compromising user privacy. Most studies on LDP assume that all users fully cooperate and contribute to the data collection process and thus the collected dataset is complete. However, in practice, especially when user population is large, such assumption seldom holds due to communication loss, user unresponsiveness or unwillingness, and incomplete user-side data. Unfortunately, state-of-the-art LDP-based data collection schemes, such as GRR, OUE and OLH, cannot handle partial data collection effectively. In this paper, we propose collaborative sampling to address partial data collection in a multi-dimensional setting. Thanks to a two-phase mechanism, we can derive the optimal sampling rate for each dimension. The optimality is shown and proved with respect to the variance of estimated frequency. Besides that, collaborative sampling is general and can be used in GRR, OUE and OLH with minimal adaption. Through experimental results, we show collaborative sampling outperforms existing mainstream data collection schemes in partial multi-dimensional data collection.
引用
收藏
页码:3948 / 3961
页数:14
相关论文
共 49 条
  • [21] Gopi Sivakanth, 2020, C LEARNING THEORY CO, V125, P1785
  • [22] Kairouz P, 2016, J MACH LEARN RES, V17
  • [23] WHAT CAN WE LEARN PRIVATELY?
    Kasiviswanathan, Shiva Prasad
    Lee, Homin K.
    Nissim, Kobbi
    Raskhodnikova, Sofya
    Smith, Adam
    [J]. SIAM JOURNAL ON COMPUTING, 2011, 40 (03) : 793 - 826
  • [24] Lapczynski M., 2013, STUDIA EKONOMICZNE, V151, P144
  • [25] Li N., 2016, SYNTH LECT INF SECUR, V8, P1, DOI DOI 10.2200/S00735ED1V01Y201609SPT018
  • [26] Estimating Numerical Distributions under Local Differential Privacy
    Li, Zitao
    Wang, Tianhao
    Lopuhaa-Zwakenberg, Milan
    Li, Ninghui
    Skoric, Boris
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 621 - 635
  • [27] ℓ-diversity: Privacy beyond k-anonymity
    Cornell University
    不详
    [J]. ACM Trans. Knowl. Discov. Data, 2007, 1
  • [28] McSherry F, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P19
  • [29] A data-driven approach to predict the success of bank telemarketing
    Moro, Sergio
    Cortez, Paulo
    Rita, Paulo
    [J]. DECISION SUPPORT SYSTEMS, 2014, 62 : 22 - 31
  • [30] Qin Z., 2016, P 2016 ACM SIGSAC C, P192