Anonymizing classification data using rough set theory

被引:26
作者
Ye, Mingquan [1 ,3 ]
Wu, Xindong [1 ,2 ]
Hu, Xuegang [1 ]
Hu, Donghui [1 ]
机构
[1] Hefei Univ Technol, Dept Comp Sci, Hefei 230009, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[3] Wannan Med Coll, Dept Comp Sci, Wuhu 241002, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
k-Anonymity; Rough sets; Multi-level granulation; Attribute value taxonomy; Privacy preserving data mining; K-ANONYMITY; REDUCTION;
D O I
10.1016/j.knosys.2013.01.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identity disclosure is one of the most serious privacy concerns in many data mining applications. A well-known privacy model for protecting identity disclosure is k-anonymity. The main goal of anonymizing classification data is to protect individual privacy while maintaining the utility of the data in building classification models. In this paper, we present an approach based on rough sets for measuring the data quality and guiding the process of anonymization operations. First, we make use of the attribute reduction theory of rough sets and introduce the conditional entropy to measure the classification data quality of anonymized datasets. Then, we extend conditional entropy under single-level granulation to hierarchical conditional entropy under multi-level granulation, and study its properties by dynamically coarsening and refining attribute values. Guided by these properties, we develop an efficient search metric and present a novel algorithm for achieving k-anonymity, Hierarchical Conditional Entropy-based Top-Down Refinement (HCE-TDR), which combines rough set theory and attribute value taxonomies. Theoretical analysis and experiments on real world datasets show that our algorithm is efficient and improves data utility. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:82 / 94
页数:13
相关论文
共 36 条
[1]  
[Anonymous], 2005, P 2005 ACM SIGMOD IN
[2]  
Bayardo RJ, 2005, PROC INT CONF DATA, P217
[3]   Geometric data perturbation for privacy preserving outsourced data mining [J].
Chen, Keke ;
Liu, Ling .
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 29 (03) :657-695
[4]   A vague-rough set approach for uncertain knowledge acquisition [J].
Feng, Lin ;
Li, Tianrui ;
Ruan, Da ;
Gou, Shirong .
KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) :837-843
[5]   Hierarchical decision rules mining [J].
Feng, Qinrong ;
Miao, Duoqian ;
Cheng, Yi .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (03) :2081-2091
[6]  
Friedman A., 2010, P 16 ACM SIGKDD INT, P493
[7]   Providing k-anonymity in data mining [J].
Friedman, Arik ;
Wolff, Ran ;
Schuster, Assaf .
VLDB JOURNAL, 2008, 17 (04) :789-804
[8]   Anonymizing classification data for privacy preservation [J].
Fung, Benjamin C. M. ;
Wang, Ke ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (05) :711-725
[9]   Privacy-preserving data publishing for cluster analysis [J].
Fung, Benjamin C. M. ;
Wang, Ke ;
Wang, Lingyu ;
Hung, Patrick C. K. .
DATA & KNOWLEDGE ENGINEERING, 2009, 68 (06) :552-575
[10]   Rough computational methods for information systems [J].
Guan, JW ;
Bell, DA .
ARTIFICIAL INTELLIGENCE, 1998, 105 (1-2) :77-103