Differentially Private Copulas, DAG and Hybrid Methods: A Comprehensive Data Utility Study

被引:1
作者
Galloni, Andrea [1 ]
Lendak, Imre [1 ,2 ]
机构
[1] Eotvos Lorand Univ, Fac Informat, Dept Data Sci & Engn, Budapest, Hungary
[2] Univ Novi Sad, Fac Tech Sci, Novi Sad, Serbia
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023 | 2023年 / 14162卷
关键词
Synthetic Data Generation; Differential Privacy; Evaluation Metrics; Copula Functions; Bayesian Networks;
D O I
10.1007/978-3-031-41456-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Differentially Private (DP) synthetic data generation (SDG) algorithms take as input a dataset containing private, confidential information and produce synthetic data with comparable statistical characteristics. The significance of such techniques is rising due to the growing awareness of the extent of data collection and usage in organizational contexts, as well as the implementation of new stricter data privacy regulations. Given the growing academic interest in DP SDG techniques, our study intends to perform a comparative evaluation of the statistical similarities and utility (in terms of machine learning performances) of a specific set of related algorithms in the realistic context of credit-risk and banking. The study compares PrivBayes, Copula-Shirley, and DPCopula algorithms and their variants using a proposed evaluation framework across three different datasets. The purpose of this study is to perform a thorough assessment of the score and to investigate the impact of different values of the privacy budget (epsilon) on the quality and usability of synthetic data generated by each method. As a result, we highlight and examine the deficiencies and capabilities of each algorithm in relation to the features' properties of the original data.
引用
收藏
页码:270 / 281
页数:12
相关论文
共 25 条
  • [1] Differentially Private Histogram Publishing through Lossy Compression
    Acs, Gergely
    Castelluccia, Claude
    Chen, Rui
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 1 - 10
  • [2] Asghar H.J, 2020, J. Priv. Confidentiality, V10
  • [3] A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics
    Baak, M.
    Koopman, R.
    Snoek, H.
    Klous, S.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 152
  • [4] Barak B., 2007, P 26 ACM SIGMOD SIGA, P273, DOI 10.1145/1265530.1265569
  • [5] Cormode G., 2012, ICDT, P299
  • [6] Differentially Private Spatial Decompositions
    Cormode, Graham
    Procopiuc, Cecilia
    Srivastava, Divesh
    Shen, Entong
    Yu, Ting
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 20 - 31
  • [7] Dong CL, 2013, INT CONF MACH LEARN, P664, DOI 10.1109/ICMLC.2013.6890373
  • [8] Dua D., 2017, UCI MACHINE LEARNING
  • [9] Dwork C, 2006, LECT NOTES COMPUT SC, V4052, P1
  • [10] Calibrating noise to sensitivity in private data analysis
    Dwork, Cynthia
    McSherry, Frank
    Nissim, Kobbi
    Smith, Adam
    [J]. THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 : 265 - 284