Geometric data perturbation for privacy preserving outsourced data mining

被引:53
作者
Chen, Keke [1 ]
Liu, Ling [2 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Privacy-preserving data mining; Data perturbation; Geometric data perturbation; Privacy evaluation; Data mining algorithms; MODEL;
D O I
10.1007/s10115-010-0362-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.
引用
收藏
页码:657 / 695
页数:39
相关论文
共 41 条
[31]  
Liu K, 2006, EUR C PRINC PRACT KN
[32]   A distributed approach to enabling privacy-preserving model-based classifier training [J].
Luo, Hangzai ;
Fan, Jianping ;
Lin, Xiaodong ;
Zhou, Aoying ;
Bertino, Elisa .
KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 20 (02) :157-185
[33]  
MCLACHLAN G, 2000, WILEY SER PROB STAT, P1, DOI 10.1002/0471721182
[34]  
Oliveira S., 2004, P INT WORKSH SEC DAT, V1, P67
[35]  
Oliveira SR, 2010, J INFORM DATA MANAG, V1, P67
[36]  
Sadun L., 2001, APPL LINEAR ALGEBRA
[38]   k-anonymity:: A model for protecting privacy [J].
Sweeney, L .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (05) :557-570
[39]   A hybrid multi-group approach for privacy-preserving data mining [J].
Teng, Zhouxuan ;
Du, Wenliang .
KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 19 (02) :133-157
[40]  
Vaidya J, 2003, P ACM SIGKDD C