Publishing histograms with outliers under data differential privacy

被引:12
作者
Han, Qilong [1 ]
Shao, Bo [1 ]
Li, Lijie [1 ]
Ma, Zhiqiang [1 ]
Zhang, Haitao [1 ]
Du, Xiaojiang [2 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
基金
中国国家自然科学基金;
关键词
differential privacy; histogram; outlier; bigdata;
D O I
10.1002/sec.1493
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histograms are important tools for data mining and analysis. Several differentially private publishing schemes for histograms have been proposed recently. Existing differentially private histogram publication schemes have shown that histogram reconstruction is a promising idea for the improvement of publication histograms' accuracy. However, none of these have properly considered the problem outliers in the original histogram, which can cause significant reconstruction errors. Based on the problem, the publication of histogram outliers under differential privacy, this paper puts forward a publication method for histograms with outliers under differential privacy: Outlier-HistoPub. Our method deals with the count sequence of the original histogram first, using a global sort to reduce the degree of alternative distribution (a concept proposed in this paper), which may eliminate the influence of outliers during reconstruction. To avoid individual privacy leakage in the reconstruction process, an exponential mechanism is used to select the most similar adjacent bins of the uniformity distribution histogram to merge each time, and the Laplace mechanism is utilized to generate noisy data to perturb the count sequence of the reconstruction histogram. Experiments prove that the method proposed in this paper can improve the efficiency and accuracy of histogram publication. Copyright (c) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:2313 / 2322
页数:10
相关论文
共 15 条
  • [1] Acs G, 2012, P 11 IEEE INT C DAT
  • [2] Dwork C, 2006, LECT NOTES COMPUT SC, P1
  • [3] Dwork C, 2006, P 3 THEOR CRYPT C TC
  • [4] Hay M, 2010, P 36 C VER LARG DAT
  • [5] Kellaris G, 2013, P VLDB END RIV GARD
  • [6] Li H, 2015, P 24 ACM INT C INF K
  • [7] Li N., 2007, IEEE 23 INT C DAT EN, P106, DOI [10.1109/ICDE.2007.367856, DOI 10.1109/ICDE.2007.367856]
  • [8] Machanavajjhala A., 2007, ACM Trans. Knowl. Discov-ery Data, DOI DOI 10.1145/1217299.1217302
  • [9] McSherry F, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P19
  • [10] McSherry Frank, 2007, P 48 ANN IEEE S FDN