Minimum spanning tree partitioning algorithm for microaggregation

被引:141
作者
Laszlo, M [1 ]
Mukherjee, S [1 ]
机构
[1] Nova SE Univ, Grad Sch Comp & Informat Sci, Ft Lauderdale, FL 33314 USA
关键词
clustering; partitioning; minimum spanning tree; microdata protection; disclosure control;
D O I
10.1109/TKDE.2005.112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a clustering algorithm for partitioning a minimum spanning tree with a constraint on minimum group size. The problem is motivated by microaggregation, a disclosure limitation technique in which similar records are aggregated into groups containing a minimum of k records. Heuristic clustering methods are needed since the minimum information loss microaggregation problem is NP- hard. Our MST partitioning algorithm for microaggregation is sufficiently efficient to be practical for large data sets and yields results that are comparable to the best available heuristic methods for microaggregation. For data that contain pronounced clustering effects, our method results in significantly lower information loss. Our algorithm is general enough to accommodate different measures of information loss and can be used for other clustering applications that have a constraint on minimum group size.
引用
收藏
页码:902 / 911
页数:10
相关论文
共 13 条
[1]  
ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
[2]  
Cormen T. H., 2001, Introduction to Algorithms, V2nd
[3]  
DANDEKAR RA, 2002, LECT NOTES COMPUTER, V2316
[4]   Practical data-oriented microaggregation for statistical disclosure control [J].
Domingo-Ferrer, J ;
Mateo-Sanz, JM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) :189-201
[5]  
Domingo-Ferrer J, 2001, Confidentiality, disclosure, and data access: theory and practical applications for statistical agencies, P111
[6]  
Hansen SL, 2003, IEEE T KNOWL DATA EN, V15, P1043, DOI 10.1109/TKDE.2003.1209020
[7]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[8]   An efficient k-means clustering algorithm:: Analysis and implementation [J].
Kanungo, T ;
Mount, DM ;
Netanyahu, NS ;
Piatko, CD ;
Silverman, R ;
Wu, AY .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) :881-892
[9]  
Oganian A., 2001, Statistical Journal of the United Nations Economic Commission for Europe, V18, P345
[10]  
SEDGEWICK R, 1988, ALGORITHMS