Constrained neighborhood preserving concept factorization for data representation

被引:13
作者
Lu, Mei [1 ,2 ,3 ,4 ]
Zhang, Li [1 ,3 ]
Zhao, Xiang-Jun [2 ,4 ]
Li, Fan-Zhang [1 ,3 ]
机构
[1] Soochow Univ, Coll Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Jiangsu Normal Univ, Coll Comp Sci & Technol, Xuzhou 221116, Peoples R China
[3] 1 Shizi St, Suzhou, Jiangsu, Peoples R China
[4] 101 Shanghai Rd, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Concept factorization; Locally consistent concept factorization; Semi-supervised document clustering; Neighborhood preserving; Data representation; NONNEGATIVE MATRIX FACTORIZATION; PARTS;
D O I
10.1016/j.knosys.2016.04.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matrix factorization based techniques, such as nonnegative matrix factorization (NMF) and concept factorization (CF), have attracted a great deal of attentions in recent years, mainly due to their ability of dimension reduction and sparse data representation. Both techniques are of unsupervised nature and thus do not make use of a priori knowledge to guide the clustering process. This could lead to inferior performance in some scenarios. As a remedy to this, a semi-supervised learning method called Pairwise Constrained Concept Factorization (PCCF) was introduced to incorporate some pairwise constraints into the CF framework. Despite its improved performance, PCCF uses only a priori knowledge and neglects the proximity information of the whole data distribution; this could lead to rather poor performance (although slightly improved comparing to CF) when only limited a priori information is available. To address this issue, we propose in this paper a novel method called Constrained Neighborhood Preserving Concept Factorization (CNPCF). CNPCF utilizes both a priori knowledge and local geometric structure of the dataset to guide its clustering. Experimental studies on three real -world clustering tasks demonstrate that our method yields a better data representation and achieves much improved clustering performance in terms of accuracy and mutual information comparing to the state-of-the-arts techniques. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:127 / 139
页数:13
相关论文
共 35 条
  • [1] [Anonymous], 2002, Principal components analysis
  • [2] [Anonymous], 2006, BOOK REV IEEE T NEUR
  • [3] [Anonymous], 1997, AM MATH SOC, DOI DOI 10.1090/CBMS/092
  • [4] [Anonymous], KNOWL BASED SYST
  • [5] [Anonymous], IEEE T CYBERN
  • [6] [Anonymous], ADV NEURAL INFORM PR
  • [7] [Anonymous], ARXIV E PRINTS
  • [8] Belkin M, 2002, ADV NEUR IN, V14, P585
  • [9] Document clustering using locality preserving indexing
    Cai, D
    He, XF
    Han, JW
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1624 - 1637
  • [10] Locally Consistent Concept Factorization for Document Clustering
    Cai, Deng
    He, Xiaofei
    Han, Jiawei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (06) : 902 - 913