A Differential Privacy Budget Allocation Algorithm Based on Out-of-Bag Estimation in Random Forest

被引:4
作者
Li, Xin [1 ]
Qin, Baodong [1 ]
Luo, Yiyuan [2 ]
Zheng, Dong [1 ,3 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Cyberspace Secur, Xian 710121, Peoples R China
[2] Huizhou Univ, Sch Comp Sci & Engn, Huizhou 516007, Peoples R China
[3] Qinghai Normal Univ, Sch Comp Sci, Xining 810008, Peoples R China
基金
中国国家自然科学基金;
关键词
differential privacy; machine learning; privacy protection; random forest; out-of-bag estimation;
D O I
10.3390/math10224338
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The issue of how to improve the usability of data publishing under differential privacy has become one of the top questions in the field of machine learning privacy protection, and the key to solving this problem is to allocate a reasonable privacy protection budget. To solve this problem, we design a privacy budget allocation algorithm based on out-of-bag estimation in random forest. The algorithm firstly calculates the decision tree weights and feature weights by the out-of-bag data under differential privacy protection. Secondly, statistical methods are introduced to classify features into best feature set, pruned feature set, and removable feature set. Then, pruning is performed using the pruned feature set to avoid decision trees over-fitting when constructing an e-differential privacy random forest. Finally, the privacy budget is allocated proportionally based on the decision tree weights and feature weights in the random forest. We conducted experimental comparisons with real data sets from Adult and Mushroom to demonstrate that this algorithm not only protects data security and privacy, but also improves model classification accuracy and data availability.
引用
收藏
页数:15
相关论文
共 32 条
[1]  
[Anonymous], 2010, P 16 ACM SIGKDD INT, DOI 10.1145/1835804.1835868
[2]  
Blum A., 2005, P 24 ACM SIGMOD SIGA, P128, DOI [DOI 10.1145/1065167.1065184, 10.1145/1065167.1065184]
[3]  
Dwork C, 2006, LECT NOTES COMPUT SC, V4052, P1
[4]   Calibrating noise to sensitivity in private data analysis [J].
Dwork, Cynthia ;
McSherry, Frank ;
Nissim, Kobbi ;
Smith, Adam .
THEORY OF CRYPTOGRAPHY, PROCEEDINGS, 2006, 3876 :265-284
[5]   A Firm Foundation for Private Data Analysis [J].
Dwork, Cynthia .
COMMUNICATIONS OF THE ACM, 2011, 54 (01) :86-95
[6]  
Fu J.B., 2018, J COMMUN, V39, P136
[7]   The Optimal Noise-Adding Mechanism in Differential Privacy [J].
Geng, Quan ;
Viswanath, Pramod .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2016, 62 (02) :925-951
[8]  
Hu XY, 2015, PROC VLDB ENDOW, V8, P1692
[9]   A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain [J].
Izonin, Ivan ;
Tkachenko, Roman ;
Shakhovska, Nataliya ;
Ilchyshyn, Bohdan ;
Singh, Krishna Kant .
MATHEMATICS, 2022, 10 (11)
[10]   Decision trees: a recent overview [J].
Kotsiantis, S. B. .
ARTIFICIAL INTELLIGENCE REVIEW, 2013, 39 (04) :261-283