A tree-based data perturbation approach for privacy-preserving data mining

被引:32
作者
IEEE Computer Society [1 ]
不详 [2 ]
不详 [3 ]
机构
[1] College of Management, University of Massachusetts Lowell, Lowell
[2] School of Management, University of Texas at Dallas, Richardson
来源
IEEE Trans Knowl Data Eng | 2006年 / 9卷 / 1278-1283期
关键词
Data mining; Data perturbation; Kd-trees; Microaggregation; Privacy;
D O I
10.1109/TKDE.2006.136
中图分类号
学科分类号
摘要
Due to growing concerns about the privacy of personal information, organizations that use their customers' records in data mining activities are forced to take actions to protect the privacy of the individuals. A frequently used disclosure protection method is data perturbation. When used for data mining, it is desirable that perturbation preserves statistical relationships between attributes, while providing adequate protection for individual confidential data. To achieve this goal, we propose a kd-tree based perturbation method, which recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The confidential data in each final subset are then perturbed using the subset average. An experimental study is conducted to show the effectiveness of the proposed method. © 2006 IEEE.
引用
收藏
页码:1278 / 1283
页数:5
相关论文
共 22 条
[1]  
Adam N.R., Wortmann J.C., Security-control methods for statistical databases: A comparative study, ACM Computing Surveys, 21, 4, pp. 515-556, (1989)
[2]  
Agrawal R., Srikant R., Privacy-preserving data mining, Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, pp. 439-450, (2000)
[3]  
Aggarwal C.C., Yu P.S., A condensation approach to privacy preserving data mining, Proc. Ninth Int'l Conf. Extending Database Technology, pp. 183-199, (2004)
[4]  
Berndt E.R., The Practice of Econometrics, (1991)
[5]  
Brand R., Domingo-Ferrer J., Mateo-Sanz J.M., Reference Data Sets to Test and Compare SDC Methods for Protection of Numerical Microdata, (2002)
[6]  
Clifton C., Kantarcioglu M., Vaidya J., Lin X., Zhu M., Tools for privacy preserving distributed data mining, SIGKDD Explorations, 4, 2, pp. 38-44, (2002)
[7]  
Defays D., Nanopoulos P., Panels of enterprises and confidentiality: The small aggregates method, Proc. Statistics Canada Symp. 92 Design and Analysis of Longitudinal Surveys, pp. 195-204, (1993)
[8]  
Domingo-Ferrer J., Mateo-Sanz J.M., Practical data-oriented microaggregation for statistical disclosure control, IEEE Trans. Knowledge and Data Eng., 14, 1, pp. 189-201, (2002)
[9]  
Domingo-Ferrer J., Torra V., A quantitative comparison of disclosure control methods for microdata, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111-134, (2001)
[10]  
Domingo-Ferrer J., Torra V., Ordinal, continuous and heterogeneous k-anonymity through microaggregation, Data Mining and Knowledge Discovery, 11, 2, pp. 195-212, (2005)