Continuous Release of Data Streams under both Centralized and Local Differential Privacy

被引:25
作者
Wang, Tianhao [1 ,2 ,7 ]
Chen, Joann Qiongna [3 ]
Zhang, Zhikun [4 ]
Su, Dong [5 ]
Cheng, Yueqiang [6 ]
Li, Zhou [3 ]
Li, Ninghui [7 ]
Jha, Somesh [8 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Univ Virginia, Charlottesville, VA 22903 USA
[3] Univ Calif Irvine, Irvine, CA USA
[4] CISPA, Saarbrucken, Germany
[5] Alibaba Inc, Hangzhou, Zhejiang, Peoples R China
[6] NIO Secur Res, San Jose, CA USA
[7] Purdue Univ, W Lafayette, IN 47907 USA
[8] Univ Wisconsin, Madison, WI 53706 USA
来源
CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY | 2021年
基金
美国国家科学基金会;
关键词
Differential Privacy; Local Differential Privacy; Continuous Observation; Data Stream;
D O I
10.1145/3460120.3484750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value in the stream can be quite large, leading to enormous DP noise and bad utility. To reduce the maximal value and noise, one way is to estimate a threshold so that values above it can be truncated. The intuition is that, in many scenarios, only a few values are large; thus truncation does not change the original data much. We develop such a method that finds a suitable threshold with DP. Given the threshold, we then propose an online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps in a framework for the private publishing of streaming data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noise to the stream, and a smoother that improves the result using post-processing. Within our framework, we also design an algorithm satisfying the more stringent DP setting called local DP. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6 - 10 orders of magnitude in terms of utility (measured by the mean squared error of the typical scenario of answering a random range query).
引用
收藏
页码:1237 / 1253
页数:17
相关论文
共 54 条
  • [1] A Case Study: Privacy Preserving Release of Spatio-temporal Density in Paris
    Acs, Gergely
    Castelluccia, Claude
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1679 - 1688
  • [2] [Anonymous], 2015, INT C THEOR APPL CRY, DOI DOI 10.1007/978-3-662-48800-3_30
  • [3] [Anonymous], 2018, IEEE Transactions on Information Theory
  • [4] [Anonymous], 2018, SIGMOD
  • [5] [Anonymous], 2017, NIPS
  • [6] [Anonymous], 2015, KDD
  • [7] Local, Private, Efficient Protocols for Succinct Histograms
    Bassily, Raef
    Smith, Adam
    [J]. STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, : 127 - 135
  • [8] Bolot J., 2013, P 16 INT C DAT THEOR, P284
  • [9] Private and Continual Release of Statistics
    Chan, T. -H. Hubert
    Shi, Elaine
    Song, Dawn
    [J]. ACM TRANSACTIONS ON INFORMATION AND SYSTEM SECURITY, 2011, 14 (03)
  • [10] Chan TH Hubert, 2019, 30 SODA JAN, P2448, DOI DOI 10.1137/1.9781611975482.150