Differentially Private Histogram Publishing through Lossy Compression

被引:102
作者
Acs, Gergely
Castelluccia, Claude
Chen, Rui
机构
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012) | 2012年
关键词
Differential privacy; histogram; lossy compression; Fourier transform; clustering;
D O I
10.1109/ICDM.2012.80
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Differential privacy has emerged as one of the most promising privacy models for private data release. It can be used to release different types of data, and, in particular, histograms, which provide useful summaries of a dataset. Several differentially private histogram releasing schemes have been proposed recently. However, most of them directly add noise to the histogram counts, resulting in undesirable accuracy. In this paper, we propose two sanitization techniques that exploit the inherent redundancy of real-life datasets in order to boost the accuracy of histograms. They lossily compress the data and sanitize the compressed data. Our first scheme is an optimization of the Fourier Perturbation Algorithm (FPA) presented in [13]. It improves the accuracy of the initial FPA by a factor of 10. The other scheme relies on clustering and exploits the redundancy between bins. Our extensive experimental evaluation over various real-life and synthetic datasets demonstrates that our techniques preserve very accurate distributions and considerably improve the accuracy of range queries over attributed histograms.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 16 条
  • [1] [Anonymous], 2006, TCC
  • [2] [Anonymous], PVLDB
  • [3] [Anonymous], 2009, SIGMOD
  • [4] Barak B., 2007, PODS
  • [5] DONOHO DL, 1994, CR ACAD SCI I-MATH, V319, P1317
  • [6] Fienberg Stephen E., 2010, PSD
  • [7] Hardt M., 2012, TECHNICAL REPORT
  • [8] Hay M., 2009, ICDM
  • [9] Jagadish H. V., 1998, VLDB
  • [10] Li C., 2010, PODS