Non-negative matrix factorization for semi-supervised data clustering

被引:103
作者
Chen, Yanhua [1 ]
Rege, Manjeet [1 ]
Dong, Ming [1 ]
Hua, Jing [1 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
基金
美国国家科学基金会;
关键词
Non-negative matrix factorization; Semi-supervised clustering; Pairwise constraint;
D O I
10.1007/s10115-008-0134-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the domain knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects specifying whether they "must" or "cannot" be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to infer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.
引用
收藏
页码:355 / 379
页数:25
相关论文
共 54 条
  • [1] [Anonymous], INT C KNOWL DISC DAT, DOI DOI 10.1145/1014052.1014062
  • [2] [Anonymous], 2000, PRINCIPLES DATA MINI, DOI DOI 10.1007/3-540-45372-5_46
  • [3] [Anonymous], REUTERS 21578 TEXT C
  • [4] Bansal N, 2002, ANN IEEE SYMP FOUND, P238, DOI 10.1109/SFCS.2002.1181947
  • [5] Bar-Hillel A., 2003, P 20 INT C MACH LEAR, P11
  • [6] Basu S., 2002, P 19 INT C MACH LEAR, V2, P27
  • [7] Blake C.L., 1998, UCI repository of machine learning databases
  • [8] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [9] Principal direction divisive partitioning
    Boley, D
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (04) : 325 - 344
  • [10] Clustering with qualitative information
    Charikar, M
    Guruswami, V
    Wirth, A
    [J]. 44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, : 524 - 533