A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms

被引:18
作者
Chakraborty, Basabi [1 ]
Kawamura, Atsushi [1 ]
机构
[1] Iwate Prefectural Univ, Dept Software & Informat Sci, 152-52 Sugo, Takizawa 0200693, Japan
关键词
Feature subset selection; wrapper fitness function with penalty; evolutionary computation;
D O I
10.1080/24751839.2018.1423792
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature subset selection is an important preprocessing task for any real life data mining or pattern recognition problem. Evolutionary computational (EC) algorithms are popular as a search algorithm for feature subset selection. With the classification accuracy as the fitness function, the EC algorithms end up with feature subsets having considerably high recognition accuracy but the number of residual features also remain quite high. For high dimensional data, reduction of number of features is also very important to minimize computational cost of overall classification process. In this work, a wrapper fitness function composed of classification accuracy with another penalty term which penalizes for large number of features has been proposed. The proposed wrapper fitness function is used for feature subset evaluation and subsequent selection of optimal feature subset with several EC algorithms. The simulation experiments are done with several benchmark data sets having small to large number of features. The simulation results show that the proposed wrapper fitness function is efficient in reducing the number of features in the final selected feature subset without significant reduction of classification accuracy. The proposed fitness function has been shown to perform well for high-dimensional data sets with dimension up to 10,000.
引用
收藏
页码:163 / 180
页数:18
相关论文
共 53 条
[11]   Multi-objective Optimization Using Pareto GA for Gene-Selection from Microarray Data for Disease Classification [J].
Chakraborty, Goutam ;
Chakraborty, Basabi .
2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, :2629-2634
[12]  
Crawford B, 2014, ROM J INF SCI TECH, V17, P252
[13]   A fast and elitist multiobjective genetic algorithm: NSGA-II [J].
Deb, K ;
Pratap, A ;
Agarwal, S ;
Meyarivan, T .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) :182-197
[14]  
Devijver P. A., 1982, PATTERN RECOGNITION
[15]  
Dorigo M., 2004, SCHOLARPEDIA
[16]  
Duda R.O., 2001, PATTERN CLASSIFICATI, Vsecond
[17]  
Goldberg D.E, 1989, OPTIMISATION MACHINE
[18]   Feature subset selection by gravitational search algorithm optimization [J].
Han, XiaoHong ;
Chang, XiaoMing ;
Quan, Long ;
Xiong, XiaoYan ;
Li, JingXia ;
Zhang, ZhaoXia ;
Liu, Yi .
INFORMATION SCIENCES, 2014, 281 :128-146
[19]  
Kashyap H, 2016, 2016 3RD INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN INFORMATION TECHNOLOGY (RAIT), P243, DOI 10.1109/RAIT.2016.7507909
[20]   A Comparative Study of Evolutionary Algorithms with a New Penalty Based Fitness Function for Feature Subset Selection [J].
Kawamura, Atsushi ;
Chakraborty, Basabi .
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 :738-747