Supervised input space scaling for non-negative matrix factorization

被引:4
作者
Driesen, J. [1 ]
Van Hamme, H. [1 ]
机构
[1] Katholieke Univ Leuven, Dept ESAT, Louvain, Belgium
关键词
Machine learning; Pattern detection; Feature selection; Automatic relevance determination; Vocabulary acquisition; Document clustering;
D O I
10.1016/j.sigpro.2011.07.016
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Discovering structure within a collection of high-dimensional input vectors is a problem that often recurs in the area of machine learning. A very suitable and widely used algorithm for solving such tasks is Non-negative Matrix Factorization (NMF). The high-dimensional vectors are arranged as columns in a data matrix, which is decomposed into two non-negative matrix factors of much lower rank. Here, we adopt the NMF learning scheme proposed by Van hamme (2008) [1]. It involves combining the training data with supervisory data, which imposes the low-dimensional structure known to be present. The reconstruction of such supervisory data on previously unseen inputs then reveals their underlying structure in an explicit way. It has been noted that for many problems, not all features of the training data correlate equally well with the underlying structure. In other words, some features are relevant for detecting patterns in the data, while others are not. In this paper, we propose an algorithm that builds upon the learning scheme of Van hamme (2008) [1], and automatically weights each input feature according to its relevance. Applications include both data improvement and feature selection. We experimentally show that our algorithm outperforms similar techniques on both counts. (c) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1864 / 1874
页数:11
相关论文
共 38 条
  • [1] [Anonymous], J MACHINE LEARNING R
  • [2] [Anonymous], 1994, MACHINE LEARNING P 1, DOI DOI 10.1016/B978-1-55860-335-6.50023-4
  • [3] Barman PC, 2006, LECT NOTES COMPUT SC, V4233, P703
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Buntine W, 2002, LECT NOTES ARTIF INT, V2430, P23
  • [6] A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION
    BYRD, RH
    LU, PH
    NOCEDAL, J
    ZHU, CY
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) : 1190 - 1208
  • [7] Chen Y., 2007, P ICDM 2007 OM NE US
  • [8] Non-negative matrix factorization for semi-supervised data clustering
    Chen, Yanhua
    Rege, Manjeet
    Dong, Ming
    Hua, Jing
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 17 (03) : 355 - 379
  • [9] Donoho D., 2003, P NIPS 2003 WHISTL B
  • [10] Elkan C, 2005, LECT NOTES COMPUT SC, V3772, P295