A genetic algorithm-based method for feature subset selection

被引:12
作者
Tan, Feng [1 ]
Fu, Xuezheng [1 ]
Zhang, Yanqing [1 ]
Bourgeois, Anu G. [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30302 USA
关键词
feature selection; gene Selection; genetic algorithm; microarray gene expression data analysis;
D O I
10.1007/s00500-007-0193-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.
引用
收藏
页码:111 / 120
页数:10
相关论文
共 29 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], SCIENCE
[3]  
[Anonymous], TECHNICAL REPORT
[4]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[5]   Identifying significant genes from microarray data [J].
Chuang, HY ;
Liu, HF ;
Brown, S ;
McMunn-Coffran, C ;
Kao, CY ;
Hsu, DF .
BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, :358-365
[6]  
Dash M., 1999, P SIGMOD RES ISS DAT
[7]  
Dy JG, 2004, J MACH LEARN RES, V5, P845
[8]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[9]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[10]  
Guyon I, 2003, J MACH LEARN RES, P1157, DOI [10.1016/j.aca.2011.07.027, DOI 10.1016/J.ACA.2011.07.027]