Cluster Forests

被引:53
作者
Yan, Donghui [1 ]
Chen, Aiyou [2 ]
Jordan, Michael I. [1 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Google, Mountain View, CA 94043 USA
[3] Univ Calif Berkeley, EECS, Berkeley, CA 94720 USA
关键词
High dimensional data analysis; Cluster ensemble; Feature selection; Spectral clustering; Stochastic block model; CONSISTENCY;
D O I
10.1016/j.csda.2013.04.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With inspiration from Random Forests (RF) in the context of classification, a new clustering ensemble method Cluster Forests (CF) is proposed. Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure kappa. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis reveals that the kappa measure makes it possible to grow the local clustering in a desirable way-it is "noise-resistant". A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:178 / 192
页数:15
相关论文
共 44 条
[1]   On spectral learning of mixtures of distributions [J].
Achlioptas, D ;
McSherry, R .
LEARNING THEORY, PROCEEDINGS, 2005, 3559 :458-469
[2]  
Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[3]  
[Anonymous], 2001, P 33 ANN ACM S THEOR
[4]  
[Anonymous], 2007, UC IRVINE MACHINE LE
[5]  
Azimi J, 2009, 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, P992
[6]   A nonparametric view of network models and Newman-Girvan and other modularities [J].
Bickel, Peter J. ;
Chen, Aiyou .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (50) :21068-21073
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Caruana R., 2004, Proceedings of the 21st international conference on Machine learning, ICML '04, P18, DOI DOI 10.1145/1015330.1015432
[9]  
Dasgupta S., 2000, P 16 C UNCERTAINTY A, P143, DOI [10.5555/647234.719759, DOI 10.5555/647234.719759]
[10]   Compressed sensing [J].
Donoho, DL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (04) :1289-1306