Robust Ensemble Clustering by Matrix Completion

被引:56
作者
Yi, Jinfeng [1 ]
Yang, Tianbao [2 ]
Jin, Rong [1 ]
Jain, Anil K. [1 ]
Mahdavi, Mehrdad [1 ]
机构
[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[2] GE Global Res, Machine Learning Lab, Machida, Tokyo, Japan
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012) | 2012年
基金
美国国家科学基金会;
关键词
Ensemble Clustering; Matrix Completion; Low Rank; Sparse; CONSENSUS;
D O I
10.1109/ICDM.2012.123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is an important task and has found applications in numerous real-world problems. Since no single clustering algorithm is able to identify all different types of cluster shapes and structures, ensemble clustering was proposed to combine different partitions of the same data generated by multiple clustering algorithms. The key idea of most ensemble clustering algorithms is to find a partition that is consistent with most of the available partitions of the input data. One problem with these algorithms is their inability to handle uncertain data pairs, i.e. data pairs for which about half of the partitions put them into the same cluster and the other half do the opposite. When the number of uncertain data pairs is large, they can mislead the ensemble clustering algorithm in generating the final partition. To overcome this limitation, we propose an ensemble clustering approach based on the technique of matrix completion. The proposed algorithm constructs a partially observed similarity matrix based on the data pairs whose cluster memberships are agreed upon by most of the clustering algorithms in the ensemble. It then deploys the matrix completion algorithm to complete the similarity matrix. The final data partition is computed by applying an efficient spectral clustering algorithm to the completed matrix. Our empirical studies with multiple real-world datasets show that the proposed algorithm performs significantly better than the state-of-the-art algorithms for ensemble clustering.
引用
收藏
页码:1176 / 1181
页数:6
相关论文
共 42 条
[1]  
[Anonymous], 1988, Algorithms for Clustering Data
[2]  
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[3]  
Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
[4]   Conceptual clustering in information retrieval [J].
Bhatia, SK ;
Deogun, JS .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03) :427-436
[5]   The Power of Convex Relaxation: Near-Optimal Matrix Completion [J].
Candes, Emmanuel J. ;
Tao, Terence .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (05) :2053-2080
[6]   Parallel Spectral Clustering in Distributed Systems [J].
Chen, Wen-Yen ;
Song, Yangqiu ;
Bai, Hongjie ;
Lin, Chih-Jen ;
Chang, Edward Y. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) :568-586
[7]   Bagging to improve the accuracy of a clustering procedure [J].
Dudoit, S ;
Fridlyand, J .
BIOINFORMATICS, 2003, 19 (09) :1090-1099
[8]  
Fern X.Z., 2004, ICML
[9]  
Fern X.Z., 2004, P 21 INT C MACH LEAR, P36, DOI DOI 10.1145/1015330.1015414
[10]  
Fern Xiaoli Zhang, 2003, P 20 INT C MACH LEAR, P186, DOI DOI 10.5555/3041838.3041862