Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

被引:119
|
作者
Chen, Rui [1 ,3 ]
Xiao, Qian [2 ]
Zhang, Yu [3 ]
Xu, Jianliang [3 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Hong Kong Baptist Univ, Hong Kong, Peoples R China
关键词
Differential privacy; high-dimensional data; joint distribution; dependency graph; junction tree algorithm; QUERIES;
D O I
10.1145/2783258.2783379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.
引用
收藏
页码:129 / 138
页数:10
相关论文
共 50 条
  • [1] Differentially Private High-Dimensional Data Publication via Markov Network
    Wei, Fengqiong
    Zhang, Wei
    Chen, Yunfang
    Zhao, Jingwen
    SECURITY AND PRIVACY IN COMMUNICATION NETWORKS, SECURECOMM 2018, PT I, 2018, 254 : 133 - 148
  • [2] Differentially private high-dimensional data publication via grouping and truncating techniques
    Ning Wang
    Yu Gu
    Jia Xu
    Fangfang Li
    Ge Yu
    Frontiers of Computer Science, 2019, 13 : 382 - 395
  • [3] Differentially Private High-Dimensional Binary Data Publication via Attribute Segmentation
    Hong J.
    Wu Y.
    Cai J.
    Sun L.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (01): : 182 - 196
  • [4] Differentially private high-dimensional data publication via grouping and truncating techniques
    Wang, Ning
    Gu, Yu
    Xu, Jia
    Li, Fangfang
    Yu, Ge
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (02) : 382 - 395
  • [5] Differentially Private High-Dimensional Data Publication in Internet of Things
    Zheng, Zhigao
    Wang, Tao
    Wen, Jinming
    Mumtaz, Shahid
    Bashir, Ali Kashif
    Chauhdary, Sajjad Hussain
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (04) : 2640 - 2650
  • [6] Differentially Private Top-k Frequent Columns Publication for High-Dimensional Data
    Wang, Ning
    Wang, Zhigang
    Gu, Yu
    Xu, Jia
    Wei, Zhiqiang
    Yu, Ge
    IEEE ACCESS, 2019, 7 : 177342 - 177353
  • [7] DPPro: Differentially Private High-Dimensional Data Release via Random Projection
    Xu, Chugui
    Ren, Ju
    Zhang, Yaoxue
    Qin, Zhan
    Ren, Kui
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2017, 12 (12) : 3081 - 3093
  • [8] Locally differentially private high-dimensional data synthesis
    Chen, Xue
    Wang, Cheng
    Yang, Qing
    Hu, Teng
    Jiang, Changjun
    SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (01)
  • [9] Locally differentially private high-dimensional data synthesis
    Xue Chen
    Cheng Wang
    Qing Yang
    Teng Hu
    Changjun Jiang
    Science China Information Sciences, 2023, 66
  • [10] Locally differentially private high-dimensional data synthesis
    Xue CHEN
    Cheng WANG
    Qing YANG
    Teng HU
    Changjun JIANG
    ScienceChina(InformationSciences), 2023, 66 (01) : 25 - 42