Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

被引:119
作者
Chen, Rui [1 ,3 ]
Xiao, Qian [2 ]
Zhang, Yu [3 ]
Xu, Jianliang [3 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Hong Kong Baptist Univ, Hong Kong, Peoples R China
来源
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2015年
关键词
Differential privacy; high-dimensional data; joint distribution; dependency graph; junction tree algorithm; QUERIES;
D O I
10.1145/2783258.2783379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.
引用
收藏
页码:129 / 138
页数:10
相关论文
共 50 条
  • [21] PrivPfC: differentially private data publication for classification
    Su, Dong
    Cao, Jianneng
    Li, Ninghui
    Lyu, Min
    VLDB JOURNAL, 2018, 27 (02) : 201 - 223
  • [22] Differentially Private Publication of Vertically Partitioned Data
    Tang, Peng
    Cheng, Xiang
    Su, Sen
    Chen, Rui
    Shao, Huaxi
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (02) : 780 - 795
  • [23] Differentially private publication of streaming trajectory data
    Ding, Xiaofeng
    Zhou, Wenxiang
    Sheng, Shujun
    Bao, Zhifeng
    Choo, Kim-Kwang Raymond
    Jin, Hai
    INFORMATION SCIENCES, 2020, 538 : 159 - 175
  • [24] DIFFERENTIALLY PRIVATE INFERENCE VIA NOISY OPTIMIZATION
    Avella-medina, Marco
    Bradshaw, Casey
    Loh, Po-ling
    ANNALS OF STATISTICS, 2023, 51 (05) : 2067 - 2092
  • [25] Locally Private High-Dimensional Crowdsourced Data Release Based on Copula Functions
    Wang, Teng
    Yang, Xinyu
    Ren, Xuebin
    Yu, Wei
    Yang, Shusen
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (02) : 778 - 792
  • [26] Differentially private data publication with multi -level data utility
    Jiang, Honglu
    Sarwar, S. M.
    Yu, Haotian
    Islam, Sheikh Ariful
    HIGH-CONFIDENCE COMPUTING, 2022, 2 (02):
  • [27] High-dimensional data compression via PHLCT
    Zhang, Zhihua
    Saito, Naoki
    WAVELETS XII, PTS 1 AND 2, 2007, 6701
  • [28] Efficient and Secure Outsourcing of Differentially Private Data Publication
    Li, Jin
    Ye, Heng
    Wang, Wei
    Lou, Wenjing
    Hou, Y. Thomas
    Liu, Jiqiang
    Lu, Rongxing
    COMPUTER SECURITY (ESORICS 2018), PT II, 2018, 11099 : 187 - 206
  • [29] Differentially Private and Utility Preserving Publication of Trajectory Data
    Gursoy, Mehmet Emre
    Liu, Ling
    Truex, Stacey
    Yu, Lei
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2019, 18 (10) : 2315 - 2329
  • [30] PRIVATE LEARNING VIA KNOWLEDGE TRANSFER WITH HIGH-DIMENSIONAL TARGETS
    Fay, Dominik
    Sjolund, Jens
    Oechtering, Tobias J.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3873 - 3877