Feature selection with SVD entropy: Some modification and extension

被引:62
作者
Banerjee, Monami [1 ]
Pal, Nikhil R. [1 ]
机构
[1] Indian Stat Inst, Elect & Commun Sci Unit, Kolkata 700108, W Bengal, India
关键词
Feature selection; Singular Value Decomposition; Entropy; EXPRESSION DATA; GENE SELECTION; CLASSIFICATION; CANCER; DISCOVERY;
D O I
10.1016/j.ins.2013.12.029
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many approaches have been developed for dimensionality reduction. These approaches can broadly be categorized into supervised and unsupervised methods. In case of supervised dimensionality reduction, for any input vector the target value is known, which can be a class label also. In a supervised approach, our objective is to select a subset of features that has adequate discriminating power to predict the target value. This target value for an input vector is absent in case of an unsupervised approach. In an unsupervised scheme, we mainly try to find a subset that can capture the inherent "structure" of the data, such as the neighborhood relation or the cluster structure. In this work, we first study a Singular Value Decomposition (SVD) based unsupervised feature selection approach proposed by Varshavsky et al. Then we propose a modification of this method to improve its performance. An SVD-entropy based supervised feature selection algorithm is also developed in this paper. Performance evaluation of the algorithms is done on altogether 13 benchmark and one Synthetic data sets. The quality of the selected features is assessed using three indices: Sammon's Error (SE), Cluster Preservation Index (CPI) and MisClassification Error (MCE) using a 1-Nearest Neighbor (1-NN) classifier. Besides showing the improvement of the modified unsupervised scheme over the existing one, we have also made a comparative study of the modified unsupervised and the proposed supervised algorithms with one well-known unsupervised and two popular supervised feature selection methods respectively. Our results reveal the effectiveness of the proposed algorithms in selecting relevant features. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:118 / 134
页数:17
相关论文
共 42 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
ANDERSON E., 1999, LAPACK USERSGUIDE, V3rd
[3]  
[Anonymous], 2005, NIPS
[4]  
[Anonymous], 2012, P AAAI C ART INT
[5]  
Boutsidis C., 2009, PROC ADV NEURAL INFO, P153
[6]  
Chatterjee S., 1986, Stat. Sci., V1, P379, DOI [10.1214/ss/1177013622, DOI 10.1214/SS/1177013622]
[7]  
Dash M, 1997, P 19 IEEE INT C TOOL
[8]  
Devaney M, 1997, P MACH LEARN 14 INT
[9]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[10]  
Dinkelbach W., 1967, Manage. Sci., V133, P492