Data release for machine learning via correlated differential privacy

被引:12
作者
Shen, Hua [1 ]
Li, Jiqiang [1 ]
Wu, Ge [2 ]
Zhang, Mingwu [1 ]
机构
[1] Hubei Univ Technol, Sch Comp Sci, POB 430068, Wuhan, Peoples R China
[2] Southeast Univ, Sch Cyber Sci & Engn, POB 211189, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Correlated differential privacy; Data correlation; Data release; Machine learning; DATA PUBLICATION;
D O I
10.1016/j.ipm.2023.103349
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional correlated differential privacy technology usually introduces too much noise, reduc-ing data availability. Besides, machine learning often confronts training sets of high-dimensional data, which brings heavy computing overhead. Aiming at the first issue, we design a more reasonable correlation analysis method. This method combines feature matching algorithms with information entropy-based feature importance to accurately calculate the correlated degree of records, reducing data correlation and correlated sensitivity and improving the data's utility. It is a novel evaluation method of the correlation of records that can alleviate the limitations of traditional calculating correlation methods. Based on this method, we provide a data release solution to reduce the data dimensionality and improve the training efficiency of machine learning by combining the maximum information coefficient with differential privacy. Furthermore, we introduce an optimization algorithm based on mutual information to choose the best principal components to improve the efficiency of our data release solution. To demonstrate the proposed solution's effectiveness and performance compared to existing schemes, we conducted experiments on three real-world datasets. The experimental results show that our scheme reduces the data correlation by up to 80% compared to existing schemes. Moreover, the accuracy of machine learning is improved by 10% to 20% for the same privacy budget.
引用
收藏
页数:14
相关论文
共 39 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]   Quantifying Differential Privacy in Continuous Data Release Under Temporal Correlations [J].
Cao, Yang ;
Yoshikawa, Masatoshi ;
Xiao, Yonghui ;
Xiong, Li .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (07) :1281-1295
[3]  
Chen J., 2017, IEEE Trans. Big Data, V7, P784, DOI DOI 10.1109/TBDATA.2017.2777862
[4]   Correlated network data publication via differential privacy [J].
Chen, Rui ;
Fung, Benjamin C. M. ;
Yu, Philip S. ;
Desai, Bipin C. .
VLDB JOURNAL, 2014, 23 (04) :653-676
[5]   Multi-Party High-Dimensional Data Publishing Under Differential Privacy [J].
Cheng, Xiang ;
Tang, Peng ;
Su, Sen ;
Chen, Rui ;
Wu, Zequn ;
Zhu, Binyuan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (08) :1557-1571
[6]   Collecting High-Dimensional and Correlation-Constrained Data with Local Differential Privacy [J].
Du, Rong ;
Ye, Qingqing ;
Fu, Yue ;
Hu, Haibo .
2021 18TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING (SECON), 2021,
[7]  
Dua D., 2017, UCI MACHINE LEARNING
[8]   Calibrating noise to sensitivity in private data analysis [J].
Dwork, Cynthia ;
McSherry, Frank ;
Nissim, Kobbi ;
Smith, Adam .
THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284
[9]   The Algorithmic Foundations of Differential Privacy [J].
Dwork, Cynthia ;
Roth, Aaron .
FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2013, 9 (3-4) :211-406
[10]   Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures [J].
Fredrikson, Matt ;
Jha, Somesh ;
Ristenpart, Thomas .
CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2015, :1322-1333