Differentially Private High-Dimensional Data Publication via Sampling-Based Inference

被引:119
|
作者
Chen, Rui [1 ,3 ]
Xiao, Qian [2 ]
Zhang, Yu [3 ]
Xu, Jianliang [3 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Hong Kong Baptist Univ, Hong Kong, Peoples R China
关键词
Differential privacy; high-dimensional data; joint distribution; dependency graph; junction tree algorithm; QUERIES;
D O I
10.1145/2783258.2783379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Releasing high-dimensional data enables a wide spectrum of data mining tasks. Yet, individual privacy has been a major obstacle to data sharing. In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. We propose a novel solution to preserve the joint distribution of a high-dimensional dataset. We first develop a robust sampling-based framework to systematically explore the dependencies among all attributes and subsequently build a dependency graph. This framework is coupled with a generic threshold mechanism to significantly improve accuracy. We then identify a set of marginal tables from the dependency graph to approximate the joint distribution based on the solid inference foundation of the junction tree algorithm while minimizing the resultant error. We prove that selecting the optimal marginals with the goal of minimizing error is NP-hard and, thus, design an approximation algorithm using an integer programming relaxation and the constrained concave-convex procedure. Extensive experiments on real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.
引用
收藏
页码:129 / 138
页数:10
相关论文
共 50 条
  • [21] A sampling-based method for high-dimensional time-variant reliability analysis
    Li, Hong-Shuang
    Wang, Tao
    Yuan, Jiao-Yang
    Zhang, Hang
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2019, 126 : 505 - 520
  • [22] Clustering High-Dimensional Data via Random Sampling and Consensus
    Traganitis, Panagiotis A.
    Slavakis, Konstantinos
    Giannakis, Georgios B.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 307 - 311
  • [23] ASYMPTOTIC INFERENCE FOR HIGH-DIMENSIONAL DATA
    Kuelbs, Jim
    Vidyashankar, Anand N.
    ANNALS OF STATISTICS, 2010, 38 (02): : 836 - 869
  • [24] Differentially Private Clustering in High-Dimensional Euclidean Spaces
    Balcan, Maria-Florina
    Dick, Travis
    Liang, Yingyu
    Mou, Wenlong
    Zhang, Hongyang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [25] Prior knowledge evaluation and emphasis sampling-based evolutionary algorithm for high-dimensional medical data feature selection
    Wang, Zhilin
    Shao, Lizhi
    Heidari, Ali Asghar
    Wang, Mingjing
    Chen, Huiling
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
  • [26] High-dimensional statistical inference via DATE
    Zheng, Zemin
    Liu, Lei
    Li, Yang
    Zhao, Ni
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2023, 52 (01) : 65 - 79
  • [27] Differentially Private Network Data Release via Structural Inference
    Xiao, Qian
    Chen, Rui
    Tan, Kian-Lee
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 911 - 920
  • [28] Differentially Private Multidimensional Data Publication
    Zhang Ji
    Dong Xin
    Yu Jiadi
    Luo Yuan
    Li Minglu
    Wu Bin
    CHINA COMMUNICATIONS, 2014, 11 (01) : 79 - 85
  • [29] Inference for High-Dimensional Streamed Longitudinal Data
    Senyuan Zheng
    Ling Zhou
    Acta Mathematica Sinica,English Series, 2025, (02) : 757 - 779
  • [30] Inference for High-Dimensional Streamed Longitudinal Data
    Zheng, Senyuan
    Zhou, Ling
    ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2025, 41 (02) : 757 - 779