A novel anonymization algorithm: Privacy protection and knowledge preservation

被引:22
作者
Yang, Weijia [1 ]
Qiao, Sanzheng [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci, Shanghai 200030, Peoples R China
[2] McMaster Univ, Dept Comp & Software, Hamilton, ON L8S 4K1, Canada
关键词
Data mining; Privacy protection; Data anonymization; Knowledge preservation;
D O I
10.1016/j.eswa.2009.05.097
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In data mining and knowledge discovery, there are two conflicting goals: privacy protection and knowledge preservation. On the one hand, we anonymize data to protect privacy; on the other hand, we allow miners to discover useful knowledge from anonymized data. In this paper, we present an anonymization method which provides both privacy protection and knowledge preservation. Unlike most anonymization methods, where data are generalized or permuted, our method anonymizes data by randomly breaking links among attribute values in records. By data randomization, our method maintains statistical relations among data to preserve knowledge, whereas in most anonymization methods, knowledge is lost. Thus the data anonymized by our method maintains useful knowledge for statistical study. Furthermore, we propose an enhanced algorithm for extra privacy protection to tackle the situation where the user's prior knowledge of original data may cause privacy leakage. The privacy levels and the accuracy of knowledge preservation of our method, along with their relations to the parameters in the method are analyzed. Experiment results demonstrate that our method is effective on both privacy protection and knowledge preservation comparing with existing methods. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:756 / 766
页数:11
相关论文
共 23 条
[1]  
AGGARWAL CC, 2008, GEN SURVEY PRIVACY P, P11, DOI DOI 10.1007/978-0-387-70992-5_2
[2]  
Agrawal D., 2001, PROC 20 ACM SIGMOD S, P247, DOI [10.1145/375551.375602, DOI 10.1145/375551.375602]
[3]  
Agrawal R., 2000, Privacy-preserving data mining, P439, DOI DOI 10.1145/342009.335438
[4]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[5]  
[Anonymous], 2006, P 32 INT C VER LARG
[6]  
[Anonymous], P 9 ACM SIGKDD INT C
[7]  
[Anonymous], 2002, ACM Sigkdd Explorations Newsletter, DOI [10.1145/772862.772867, DOI 10.1145/772862.772867]
[8]  
[Anonymous], 2005, VLDB, DOI DOI 10.5555/1083592.1083696
[9]  
Chen C, 1983, MATH ANAL
[10]  
CIRIANI V, 2008, K ANONYMOUS DATA MIN, P105