A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints

被引:10
作者
Ma, Huifang [1 ]
Zhao, Weizhong [2 ]
Shi, Zhongzhi [3 ]
机构
[1] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Gansu, Peoples R China
[2] Xiangtan Univ, Coll Informat Engn, Xiangtan 411105, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Nonnegative matrix factorization; Semi-supervised clustering; Dual constraints; Pair-wise constraints; Word-level constraints; ALGORITHMS;
D O I
10.1007/s10115-012-0560-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints.
引用
收藏
页码:629 / 651
页数:23
相关论文
共 47 条
  • [1] [Anonymous], 2003, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence
  • [2] [Anonymous], 2004, P 10 ACM SIGKDD INT, DOI DOI 10.1145/1014052.1014062
  • [3] [Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
  • [4] [Anonymous], 2003, P 26 ANN INT ACM SIG, DOI DOI 10.1145/860435.860485
  • [5] [Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775110
  • [6] Banerjee A., 2004, KDD, P509, DOI DOI 10.1145/1014052.1014111
  • [7] Basu S., 2002, P INT C MACH LEARN, P27
  • [8] Algorithms and applications for approximate nonnegative matrix factorization
    Berry, Michael W.
    Browne, Murray
    Langville, Amy N.
    Pauca, V. Paul
    Plemmons, Robert J.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 155 - 173
  • [9] χ-Sim: A New Similarity Measure for the Co-clustering Task
    Bisson, Gilles
    Hussain, Fawad
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 211 - 217
  • [10] Locally linear metric adaptation with application to semi-supervised clustering and image retrieval
    Chang, Hong
    Yeung, Dit-Yan
    [J]. PATTERN RECOGNITION, 2006, 39 (07) : 1253 - 1264