Decision Tree Classification with Differential Privacy: A Survey

被引:74
作者
Fletcher, Sam [1 ]
Islam, Md Zahidul [1 ]
机构
[1] Charles Sturt Univ, Bathurst, NSW 2795, Australia
关键词
Differential privacy; decision tree; decision forest; implementations; comparisons; ANONYMITY; NOISE;
D O I
10.1145/3337064
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while simultaneously not ruining the predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their participation. In this survey, we focus on one particular data mining algorithm-decision trees-and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze both greedy and random decision trees, and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
引用
收藏
页数:33
相关论文
共 80 条
[1]  
ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
[2]  
Aggarwal CC, 2008, ADV DATABASE SYST, V34, P11
[3]  
Agrawal R, 2000, SIGMOD REC, V29, P439, DOI 10.1145/335191.335438
[4]  
[Anonymous], 2016, Differential Privacy: From Theory to Practice
[5]  
[Anonymous], ACM SIGMOBILE MOBILE
[6]  
[Anonymous], ACM COMPUTING SURVEY
[7]  
[Anonymous], PILGRIM WARNS DATA D
[8]  
[Anonymous], TECHNICAL REPORT
[9]  
[Anonymous], 8 SIGKDD INT C KNOWL
[10]  
[Anonymous], 2012, ESANN 2012 P 20 EURO