Spectral Ensemble Clustering

被引:91
作者
Liu, Hongfu [1 ]
Liu, Tongliang [2 ,3 ]
Wu, Junjie [4 ]
Tao, Dacheng [2 ,3 ]
Fu, Yun [1 ]
机构
[1] Northeastern Univ, Coll Engn, Dept Elect & Comp Engn, Boston, MA 02115 USA
[2] Univ Technol, QCIS, Sydney, NSW, Australia
[3] Univ Technol, FEIT, Sydney, NSW, Australia
[4] Beihang Univ, Sch Econ & Management, Beijing, Peoples R China
来源
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2015年
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划); 澳大利亚研究理事会;
关键词
Ensemble Clustering; Co-association Matrix; K-means;
D O I
10.1145/2783258.2783287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble clustering, also known as consensus clustering, is emerging as a promising solution for multi-source and/or heterogeneous data clustering. The co-association matrix based method, which redefines the ensemble clustering problem as a classical graph partition problem, is a landmark method in this area. Nevertheless, the relatively high time and space complexity preclude it from real-life large-scale data clustering. We therefore propose SEC, an efficient Spectral Ensemble Clustering method based on co-association matrix. We show that SEC has theoretical equivalence to weighted K-means clustering and results in vastly reduced algorithmic complexity. We then derive the latent consensus function of SEC, which to our best knowledge is among the first to bridge co-association matrix based method to the methods with explicit object functions. The robustness and generalizability of SEC are then investigated to prove the superiority of SEC in theory. We finally extend SEC to meet the challenge rising from incomplete basic partitions, based on which a scheme for big data clustering can be formed. Experimental results on various real-world data sets demonstrate that SEC is an effective and efficient competitor to some state-of-the-art ensemble clustering methods and is also suitable for big data clustering.
引用
收藏
页码:715 / 724
页数:10
相关论文
共 33 条
[1]  
[Anonymous], 1991, Probability in Banach Spaces
[2]  
[Anonymous], ADV LECT MACHINE LEA
[3]   Cumulative voting consensus method for partitions with a variable number of clusters [J].
Ayad, Hanan G. ;
Kamel, Mohamed S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (01) :160-173
[4]  
Banerjee A., 2005, JMLR
[5]  
Bartlett P., 1998, IEEE T INFORM THEORY
[6]  
Biau G., 2008, IEEE T INFORM THEORY
[7]  
Carpineto C., 2012, PAMI
[8]  
Dhillon I., 2004, P KDD
[9]  
Domeniconi C., 2009, TKDD
[10]  
Fern X. Z., 2004, P ICML