Collaborative Sampling for Partial Multi-Dimensional Value Collection Under Local Differential Privacy

被引:3
作者
Qian, Qiuyu [1 ,2 ]
Ye, Qingqing [1 ]
Hu, Haibo [1 ]
Huang, Kai [3 ]
Chan, Tom Tak-Lam [2 ]
Li, Jin [4 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
[2] Ctr Adv Reliabil & Safety CAiRS, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[4] Guangzhou Univ, Inst Artificial Intelligence & Blockchain, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Local differential privacy; collaborative sam-pling; privacy-preserving data collection; multi-dimensional data;
D O I
10.1109/TIFS.2023.3289007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In big data era, companies and organizations are keen to collect data from users and analyse their behaviour patterns to make decisions or predictions for profits. However, it undermines users' privacy because the collected data can be quite sensitive and easy to leak. To address privacy problems, local differential privacy (LDP) has been proposed for untrusted data collectors to obtain statistical information without compromising user privacy. Most studies on LDP assume that all users fully cooperate and contribute to the data collection process and thus the collected dataset is complete. However, in practice, especially when user population is large, such assumption seldom holds due to communication loss, user unresponsiveness or unwillingness, and incomplete user-side data. Unfortunately, state-of-the-art LDP-based data collection schemes, such as GRR, OUE and OLH, cannot handle partial data collection effectively. In this paper, we propose collaborative sampling to address partial data collection in a multi-dimensional setting. Thanks to a two-phase mechanism, we can derive the optimal sampling rate for each dimension. The optimality is shown and proved with respect to the variance of estimated frequency. Besides that, collaborative sampling is general and can be used in GRR, OUE and OLH with minimal adaption. Through experimental results, we show collaborative sampling outperforms existing mainstream data collection schemes in partial multi-dimensional data collection.
引用
收藏
页码:3948 / 3961
页数:14
相关论文
共 49 条
  • [1] Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy
    Arcolezi, Heber H.
    Couchot, Jean-Francois
    Al Bouna, Bechara
    Xiao, Xiaokui
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 47 - 57
  • [2] Local, Private, Efficient Protocols for Succinct Histograms
    Bassily, Raef
    Smith, Adam
    [J]. STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, : 127 - 135
  • [3] Bassily Raef, 2017, ADV NEURAL INFORM PR, P2285, DOI DOI 10.5555/3294771.3294989
  • [4] PriSTE: From Location Privacy to Spatiotemporal Event Privacy
    Cao, Yang
    Xiao, Yonghui
    Xiong, Li
    Bai, Liquan
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1606 - 1609
  • [5] Answering Range Queries Under Local Differential Privacy
    Cormode, Graham
    Kulkarni, Tejas
    Srivastava, Divesh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (10): : 1126 - 1138
  • [6] Privacy at Scale: Local Differential Privacy in Practice
    Cormode, Graham
    Jha, Somesh
    Kulkarni, Tejas
    Li, Ninghui
    Srivastava, Divesh
    Wang, Tianhao
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1655 - 1658
  • [7] ] Marginal Release Under Local Differential Privacy
    Cormode, Graham
    Kulkarni, Tejas
    Srivastava, Divesh
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 131 - 146
  • [8] Daniely A., 2019, P ADV NEUR INF PROC, V32, P1
  • [9] Differential Privacy Team Apple, 2017, Apple Mach. Learn. J, V1, P1
  • [10] Ding B., 2017, Proc. Adv. Neural Inf. Process. Syst., V30, P1