Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization

被引:8
作者
Hoai Bach Nguyen [1 ]
Xue, Bing [1 ]
Andreae, Peter [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
来源
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I | 2016年 / 9597卷
关键词
Feature selection; Mutual information estimation; Particle swarm optimization; CLASSIFICATION; RANKING;
D O I
10.1007/978-3-319-31204-0_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a pre-processing step in classification, which selects a small set of important features to improve the classification performance and efficiency. Mutual information is very popular in feature selection because it is able to detect non-linear relationship between features. However the existing mutual information approaches only consider two-way interaction between features. In addition, in most methods, mutual information is calculated by a counting approach, which may lead to an inaccurate results. This paper proposes a filter feature selection algorithm based on particle swarm optimization (PSO) named PSOMIE, which employs a novel fitness function using nearest neighbor mutual information estimation (NNE) to measure the quality of a feature set. PSOMIE is compared with using all features and two traditional feature selection approaches. The experiment results show that the mutual information estimation successfully guides PSO to search for a small number of features while maintaining or improving the classification performance over using all features and the traditional feature selection methods. In addition, PSOMIE provides a strong consistency between training and test results, which may be used to avoid overfitting problem.
引用
收藏
页码:719 / 736
页数:18
相关论文
共 35 条
[1]  
[Anonymous], P 7 INT C MACH LEARN
[2]  
[Anonymous], 2006, THESIS U PRETORIA S
[3]  
[Anonymous], 2000, Pattern Classification, DOI DOI 10.1007/978-3-319-57027-3_4
[4]  
Asuncion A., 2007, Uci machine learning repository
[5]   Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson [J].
Bhowan, Urvesh ;
McCloskey, D. J. .
GENETIC PROGRAMMING (EUROGP 2015), 2015, 9025 :153-166
[6]  
Butler-Yeoman T, 2015, IEEE C EVOL COMPUTAT, P2428, DOI 10.1109/CEC.2015.7257186
[7]  
Cervante L, 2012, IEEE C EVOL COMPUTAT
[8]   Improved binary PSO for feature selection using gene expression data [J].
Chuang, Li-Yeh ;
Chang, Hsueh-Wei ;
Tu, Chung-Jui ;
Yang, Cheng-Hong .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2008, 32 (01) :29-38
[9]  
Dash M., 1997, Intelligent Data Analysis, V1
[10]  
Dash M, 2000, LECT NOTES ARTIF INT, V1805, P98