Scalable, balanced model-based clustering

被引:0
作者
Zhong, S [1 ]
Ghosh, J [1 ]
机构
[1] Univ Texas, Dept Elect & Comp Engn, Austin, TX 78712 USA
来源
PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2003年
关键词
model-based clustering; scalable algorithms; balanced clustering; constrained clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partitional, model-based clustering algorithms are viewed as an iterative two-step optimization process-iterative model re-estimation and sample re-assignment. Instead of a maximum-likelihood (ML) assignment, a balance-constrained approach is used for the sample assignment step. An efficient iterative bipartitioning, heuristic is developed to reduce the computational complexity of this step and make the balanced sample assignment algorithm scalable to large datasets. We demonstrate the superiority of this approach to regular ML clustering on axbitrary-shape 2-D spatial data, high-dimensional text documents, and EEG time series.
引用
收藏
页码:71 / 82
页数:12
相关论文
共 32 条
[1]  
[Anonymous], COMPUTER J
[2]  
[Anonymous], 2002, J. Mach. Learn. Res
[3]  
[Anonymous], P 8 INT C DAT THEOR
[4]   Linear programming in O(n3/ln n/L) operations [J].
Anstreicher, KM .
SIAM JOURNAL ON OPTIMIZATION, 1999, 9 (04) :803-812
[5]  
Banerjee A, 2002, SIAM PROC S, P333
[6]   Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres [J].
Banerjee, A ;
Ghosh, J .
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, :1590-1595
[7]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[8]  
Bradley P., 2000, Microsoft Research Technical Report
[9]  
Dhillon I. S., 2001, P 7 ACM SIGKDD INT C, P269, DOI DOI 10.1145/502512.502550
[10]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175