IHP: improving the utility in differential private histogram publication

被引:13
作者
Li, Hui [1 ,2 ]
Cui, Jiangtao [3 ]
Meng, Xue [1 ]
Ma, Jianfeng [1 ]
机构
[1] Xidian Univ, Sch Cyber Engn, Xian, Shaanxi, Peoples R China
[2] Natl Engn Lab Publ Secur Risk Percept & Control B, Beijing, Peoples R China
[3] Xidian Univ, Sch Comp Sci & Technol, Xian, Shaanxi, Peoples R China
关键词
Differential privacy; Data publication; Histogram; QUERIES; ALGORITHM;
D O I
10.1007/s10619-018-07255-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Differential privacy (DP) is a promising tool for preserving privacy during data publication, as it provides strong theoretical privacy guarantees in face of adversaries with arbitrary background knowledge. Histogram, as the result of a set of count queries, serves as a core statistical tool to report data distributions and is in fact viewed as the fundamental method for many other statistical analysis such as range queries. It is an important form for data publishing. In this paper, we consider the scenario of publishing sensitive histogram data with differential privacy scheme. Existing work in this field has justified that, comparing to directly applying DP techniques (i.e., injecting noise) over the counts in histogram bins, grouping bins before noise injection is more effective (i.e., with higher utility) as it introduces much less error over the sanitized histogram given the same privacy budget. However, state-of-the-art works have not unveiled how the overall utility of a sanitized histogram can be affected by the balance between the privacy budget distributed between grouping and noise injection phases. In this work, we conduct a theoretical study towards how the probability of getting better groups can be improved such that the overall error introduced in sanitized histogram can be further reduced, which directly leads to a higher utility for the sanitized histograms. In particular, we show that the probability of achieving better grouping can be affected by two factors, namely privacy budget assigned in grouping and the normalized utility function used for selecting groups. Motivated by that, we propose a new DP histogram publishing scheme, namely Iterative Histogram Partition, in which we carefully assign privacy budget between grouping and injection phases based on our theoretical study. We also theoretically prove that -differential privacy can be achieved according to our new scheme. Moreover, we also show that, under the same privacy budget, our scheme exhibits less errors in the sanitized histograms comparing with state-of-the-art methods. We also extends the model to multi-dimensional histogram publication cases. Finally, empirical study over four real-world datasets also justifies that our scheme achieves the least error among series of state-of-the-art baseline methods.
引用
收藏
页码:721 / 750
页数:30
相关论文
共 34 条
  • [1] Differentially Private Histogram Publishing through Lossy Compression
    Acs, Gergely
    Castelluccia, Claude
    Chen, Rui
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 1 - 10
  • [2] [Anonymous], 2011, P 2011 ACM SIGMOD IN
  • [3] Barak Boaz, 2007, P 26 ACM SIGMOD SIGA, P273, DOI 10.1145/1265530.1265569
  • [4] Cho HJ, 2015, DISTRIB PARALLEL DAT, V33, P319, DOI 10.1007/s10619-014-7152-z
  • [5] Differentially Private Spatial Decompositions
    Cormode, Graham
    Procopiuc, Cecilia
    Srivastava, Divesh
    Shen, Entong
    Yu, Ting
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 20 - 31
  • [6] Calibrating noise to sensitivity in private data analysis
    Dwork, Cynthia
    McSherry, Frank
    Nissim, Kobbi
    Smith, Adam
    [J]. THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 : 265 - 284
  • [7] Hay M, 2010, PROC VLDB ENDOW, V3, P1021
  • [8] Privacy-Preserving Utility Verification of the Data Published by Non-Interactive Differentially Private Mechanisms
    Hua, Jingyu
    Tang, An
    Fang, Yixin
    Shen, Zhenyu
    Zhong, Sheng
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016, 11 (10) : 2298 - 2311
  • [9] A privacy-enhancing model for location-based personalized recommendations
    Huang, Jin
    Qi, Jianzhong
    Xu, Yabo
    Chen, Jian
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2015, 33 (02) : 253 - 276
  • [10] Practical Differential Privacy via Grouping and Smoothing
    Kellaris, Georgios
    Papadopoulos, Stavros
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (05): : 301 - 312