Using genetic programming for context-sensitive feature scoring in classification problems

被引:19
作者
Neshatian, Kourosh [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
关键词
genetic programming; feature scoring; feature ranking; feature selection; classification; FEATURE-SELECTION; ALGORITHM; DESIGN;
D O I
10.1080/09540091.2011.630065
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature scoring is an avenue to feature selection that provides a measure of usefulness for the individual features of a classification task. Features are ranked based on their scores and selection is performed by choosing a small group of high-ranked features. Most existing feature scoring/ranking methods focus on the relevance of a single feature to the class labels regardless of the role of other features (context-insensitive). The paper proposes a genetic programming (GP)-based method to see how a set of features can contribute towards discriminating different classes. The features receive score in the context of other features participating in a GP program. The scoring mechanism is based on the frequency of appearance of each feature in a collection of GP programs and the fitness of those programs. Our results show that the proposed feature ranking method can detect important features of a problem. A variety of different classifiers restricted to just a few of these high-ranked features work well. The proposed scoring-ranking mechanism can also shrink the search space of size O(2(n)) of subsets of features to a search space of size O(n) in which there are points that are very likely to improve the classification performance.
引用
收藏
页码:183 / 207
页数:25
相关论文
共 25 条
[1]  
[Anonymous], 1998, Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
[2]  
[Anonymous], 2014, C4. 5: programs for machine learning
[3]  
[Anonymous], 2003, Genetic programming IV: routine human-competitive machine intelligence
[4]  
Biesiada J., 2005, International Conference on Research in Electrotechnology and Applied Informatics, P109
[5]  
DAVIS L, 1989, PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON GENETIC ALGORITHMS, P61
[6]   Novel feature selection method for genetic programming using metabolomic 1H NMR data [J].
Davis, RA ;
Charlton, AJ ;
Oehlschlager, S ;
Wilson, JC .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 81 (01) :50-59
[7]  
Frank A., 2010, UCI machine learning repository, V213
[8]  
Guyon I., 2003, J MACH LEARN RES, V3, P1157
[9]   Combinations of weak classifiers [J].
Ji, CY ;
Ma, S .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (01) :32-42
[10]  
John George H., 1995, ESTIMATING CONTINUOU, DOI DOI 10.1109/TGRS.2004.834800