Selecting feature subset with sparsity and low redundancy for unsupervised learning

被引：47

作者：

Han, Jiuqi ^{[1
]}

Sun, Zhengya ^{[1
]}

Hao, Hongwei ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2015年 / 86卷

基金：

中国国家自然科学基金;

关键词：

Unsupervised feature selection; Nonnegative spectral analysis; Sparsity and low redundancy; FACE RECOGNITION;

D O I：

10.1016/j.knosys.2015.06.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature selection techniques are attracting more and more attention with the growing number of domains that produce high dimensional data. Due to the absence of class labels, many researchers focus on the unsupervised scenario, attempting to find an optimal feature subset that preserves the original data distribution. However, the existing methods either fail to achieve sparsity or ignore the potential redundancy among features. In this paper, we propose a novel unsupervised feature selection algorithm, which retains the preserving power, and implements high sparsity and low redundancy in a unified manner. On the one hand, to preserve the data structure of the whole feature set, we build the graph Laplacian matrix and learn the pseudo class labels through spectral analysis. By finding a feature weight matrix, we are allowed to map the original data into a low dimensional space based on the pseudo labels. On the other hand, to ensure the sparsity and low redundancy simultaneously, we introduce a novel regularization term into the objective function with the nonnegative constraints imposed, which can be viewed as the combination of the matrix norms parallel to.parallel to(m1) and parallel to.parallel to(m2) on the weights of features. An iterative multiplicative algorithm is accordingly designed with proved convergence to efficiently solve the constrained optimization problem. Extensive experimental results on different real world data sets demonstrate the promising performance of our proposed method over the state-of-the-arts. (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：210 / 223

页数：14

共 31 条

[1]

[Anonymous], 2005, NIPS

[2]

[Anonymous], 2011, IJCAI INT JOINT C AR

[3]

[Anonymous], 2000, Pattern Classification, DOI DOI 10.1007/978-3-319-57027-3_4

[4]

[Anonymous], 2012, P AAAI C ART INT

[5]

[Anonymous], P INT C COMP VIS ICC

[6]

[Anonymous], 1951, P 2 BERK S

[7]

Cai D., 2010, P 16 ACM SIGKDD INT, P333, DOI DOI 10.1145/1835804.1835848

[8] Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].

Cai, Deng ;

He, Xiaofei ;

Han, Jiawei ;

Huang, Thomas S. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560

[9]

Cal X., 2011, 2011 IEEE 11 INT C D, P91

[10] Bayesian feature and model selection for Gaussian mixture models [J].

Constantinopoulos, C ;

Titsias, MK ;

Likas, A .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (06) :1013-U1

← 1 2 3 4 →