Semi-supervised concept factorization for document clustering

被引:47
作者
Lu, Mei [1 ,2 ]
Zhao, Xiang-Jun [2 ]
Zhang, Li [1 ]
Li, Fan-Zhang [1 ]
机构
[1] Suzhou Univ, Coll Comp Sci & Technol, Suzhou 215006, Jiangsu, Peoples R China
[2] Jiangsu Normal Univ, Coll Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Concept factorization; Locally consistent concept factorization; Semi-supervised document clustering; NONNEGATIVE MATRIX FACTORIZATION;
D O I
10.1016/j.ins.2015.10.038
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nonnegative Matrix Factorization (NMF) and Concept Factorization (CF) are two popular methods for finding the low-rank approximation of nonnegative matrix. Different from NMF, CF can be applied not only to the matrix containing negative values but also to the kernel space. Based on NMF and CF, many methods, such as Graph regularized Nonnegative Matrix Factorization (GNMF) and Locally Consistent Clustering Factorization (LCCF) can significandy improve the performance of clustering. Unfortunately, these are unsupervised learning methods. In order to enhance the clustering performance with the supervisory information, a Semi-Supervised Concept Factorization (SSCF) is proposed in this paper by incorporating the pairwise constraints into CF as the reward and penalty terms, which can guarantee that the data points belonging to a cluster in the original space are still in the same cluster in the transformed space. By comparing with the state-of-the-arts algorithms (KM, NMF, CF, LCCF, GNMF, PCCF), experimental results on document clustering show that the proposed algorithm has better performance in terms of accuracy and mutual information. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:86 / 98
页数:13
相关论文
共 33 条
[1]  
[Anonymous], 2001, ICML
[2]  
[Anonymous], 2004, P 10 ACM SIGKDD INT, DOI DOI 10.1145/1014052.1014062
[3]   Combining supervised and unsupervised models via unconstrained probabilistic embedding [J].
Ao, Xiang ;
Luo, Ping ;
Ma, Xudong ;
Zhuang, Fuzhen ;
He, Qing ;
Shi, Zhongzhi ;
Shen, Zhiyong .
INFORMATION SCIENCES, 2014, 257 :101-114
[4]   A similarity assessment technique for effective grouping of documents [J].
Basu, Tanmay ;
Murthy, C. A. .
INFORMATION SCIENCES, 2015, 311 :149-162
[5]  
Belkin M, 2002, ADV NEUR IN, V14, P585
[6]   Document clustering using locality preserving indexing [J].
Cai, D ;
He, XF ;
Han, JW .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) :1624-1637
[7]   Locally Consistent Concept Factorization for Document Clustering [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (06) :902-913
[8]   Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei ;
Huang, Thomas S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560
[9]   Non-negative matrix factorization for semi-supervised data clustering [J].
Chen, Yanhua ;
Rege, Manjeet ;
Dong, Ming ;
Hua, Jing .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 17 (03) :355-379
[10]   Non-negative Matrix Tri-Factorization for co-clustering: An analysis of the block matrix [J].
Del Buono, N. ;
Pio, G. .
INFORMATION SCIENCES, 2015, 301 :13-26