Geometric data perturbation for privacy preserving outsourced data mining

被引:53
作者
Chen, Keke [1 ]
Liu, Ling [2 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Privacy-preserving data mining; Data perturbation; Geometric data perturbation; Privacy evaluation; Data mining algorithms; MODEL;
D O I
10.1007/s10115-010-0362-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.
引用
收藏
页码:657 / 695
页数:39
相关论文
共 41 条
[1]  
Aggarwal CC, 2004, LECT NOTES COMPUT SC, V2992, P183
[2]  
Agrawal D, 2002, P ACM C PRINC DAT SY
[3]  
Agrawal R, 2000, P ACM SIGMOD C ACM D
[4]  
Amazon, APPL HOST AM CLOUDS
[5]  
[Anonymous], 2009, CLOUDS BERKELEY VIEW
[6]  
Bhatia R., 2013, MATRIX ANAL
[7]  
Bruening PJ, 2009, BNA PRIVACY SECURITY, V8
[8]  
Chen K., 2007, SIAM DAT MIN C
[9]  
Chen K., 2005, Technical Report
[10]  
Christianini N., 2000, INTRO SUPPORT VECTOR, P189