Fuzzy semi-supervised co-clustering for text documents

被引:49
作者
Yan, Yang [1 ]
Chen, Lihui [1 ]
Tjhi, William-Chandra [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
关键词
Semi-supervised learning; Heuristic; Must-link/cannot-link constraint; Fuzzy co-clustering; NONNEGATIVE MATRIX FACTORIZATION;
D O I
10.1016/j.fss.2012.10.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper we propose a new heuristic semi-supervised fuzzy co-clustering algorithm (SS-HFCR) for categorization of large web documents. In this approach, the clustering process is carried out by incorporating some prior knowledge in the form of pair-wise constraints provided by users into the fuzzy co-clustering framework. Each constraint specifies whether a pair of documents "must" or "cannot" be clustered together. Moreover, we formulate the competitive agglomeration cost function which is also able to make use of prior knowledge in the clustering process. The experimental studies on a number of large benchmark datasets demonstrate the strength and potentials of SS-HFCR in terms of accuracy, stability and efficiency, compared with some of the recent popular semi-supervised clustering approaches. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:74 / 89
页数:16
相关论文
共 38 条
[1]  
[Anonymous], 2005, Proceedings of the 22nd International Conference on Machine Learning
[2]  
[Anonymous], 2004, P 10 ACM SIGKDD INT, DOI DOI 10.1145/1014052.1014062
[3]  
[Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[4]  
[Anonymous], 2003, P 26 ANN INT ACM SIG, DOI DOI 10.1145/860435.860485
[5]  
Basu S, 2002, MACHINE LEARNING, P19
[6]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[7]   Enhancement of fuzzy clustering by mechanisms of partial supervision [J].
Bouchachia, Abdelhamid ;
Pedrycz, Witold .
FUZZY SETS AND SYSTEMS, 2006, 157 (13) :1733-1759
[8]   A method for training finite mixture models under a fuzzy clustering principle [J].
Chatzis, Sotirios .
FUZZY SETS AND SYSTEMS, 2010, 161 (23) :3000-3013
[9]   Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering [J].
Chen, Yanhua ;
Wang, Lijun ;
Dong, Ming .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) :1459-1474
[10]   Non-negative matrix factorization for semi-supervised data clustering [J].
Chen, Yanhua ;
Rege, Manjeet ;
Dong, Ming ;
Hua, Jing .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 17 (03) :355-379