TFRP: An efficient micro aggregation algorithm for statistical disclosure control

被引:44
作者
Chang, Chin-Chen
Li, Yu-Chiang
Huang, Wen-Hung
机构
[1] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 40724, Taiwan
[2] Natl Chung Cheng Univ, Dept Comp Sci & Informat Engn, Chiayi 62102, Taiwan
[3] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu 30013, Taiwan
关键词
microaggregation; disclosure control; k-anonymity; database security;
D O I
10.1016/j.jss.2007.02.014
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, the issue of statistic disclosure control (SDC) has attracted much attention. SDC is a very important part of data security dealing with the protection of databases. Microaggregation for SDC techniques is widely used to protect confidentiality in statistical databases released for public use. The basic problem of microaggregation is that similar records are clustered into groups, and each group contains at least k records to prevent disclosure of individual information, where k is a pre-defined security threshold. For a certain k, an optimal multivariable microaggregation has the lowest information loss. The minimum information loss is an NP-hard problem. Existing fixed-size techniques can obtain a low information loss with O(n(2)) or O(n(3)/k) time complexity. To improve the execution time and lower information loss, this study proposes the Two Fixed Reference Points (TFRP) method, a two-phase algorithm for microaggregation. In the first phase, TFRP employs the pre-computing and median-of-medians techniques to efficiently shorten its running time to O(n(2)/k). To decrease information loss in the second phase, TFRP generates variable-size groups by removing the lower homogenous groups. Experimental results reveal that the proposed method is significantly faster than the Diameter and the Centroid methods. Running on several test datasets, TFRP also significantly reduces information loss, particularly in sparse datasets with a large k. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:1866 / 1878
页数:13
相关论文
共 24 条
[1]  
ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
[2]  
Aggarwal G, 2005, P INT C DAT THEOR
[3]  
Atallah M., 1999, PROC 1999 WORKSHOP K, P45, DOI DOI 10.1109/KDEX.1999.836532
[4]   AN IMPROVEMENT OF THE MINIMUM DISTORTION ENCODING ALGORITHM FOR VECTOR QUANTIZATION [J].
BEI, CD ;
GRAY, RM .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1985, 33 (10) :1132-1133
[5]  
BRAND R, 2002, IST2000205069 CASC
[6]  
Cormen TH, 2003, INTRO ALGORITHMS
[7]   Ordinal, continuous and heterogeneous k-anonymity through microaggregation [J].
Domingo-Ferrer, J ;
Torra, V .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (02) :195-212
[8]   Practical data-oriented microaggregation for statistical disclosure control [J].
Domingo-Ferrer, J ;
Mateo-Sanz, JM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) :189-201
[9]  
DOMINGOFERRER J, 2006, P 2006 EUR C QUAL SU
[10]  
DOMINGOFERRER J, 2004, LECT NOTES COMPUTER, V3050, P1