Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization

被引:8
作者
Hoai Bach Nguyen [1 ]
Xue, Bing [1 ]
Andreae, Peter [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
来源
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I | 2016年 / 9597卷
关键词
Feature selection; Mutual information estimation; Particle swarm optimization; CLASSIFICATION; RANKING;
D O I
10.1007/978-3-319-31204-0_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a pre-processing step in classification, which selects a small set of important features to improve the classification performance and efficiency. Mutual information is very popular in feature selection because it is able to detect non-linear relationship between features. However the existing mutual information approaches only consider two-way interaction between features. In addition, in most methods, mutual information is calculated by a counting approach, which may lead to an inaccurate results. This paper proposes a filter feature selection algorithm based on particle swarm optimization (PSO) named PSOMIE, which employs a novel fitness function using nearest neighbor mutual information estimation (NNE) to measure the quality of a feature set. PSOMIE is compared with using all features and two traditional feature selection approaches. The experiment results show that the mutual information estimation successfully guides PSO to search for a small number of features while maintaining or improving the classification performance over using all features and the traditional feature selection methods. In addition, PSOMIE provides a strong consistency between training and test results, which may be used to avoid overfitting problem.
引用
收藏
页码:719 / 736
页数:18
相关论文
共 35 条
  • [1] [Anonymous], P 7 INT C MACH LEARN
  • [2] [Anonymous], 2006, THESIS U PRETORIA S
  • [3] [Anonymous], 2000, Pattern Classification, DOI DOI 10.1007/978-3-319-57027-3_4
  • [4] Asuncion A., 2007, Uci machine learning repository
  • [5] Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson
    Bhowan, Urvesh
    McCloskey, D. J.
    [J]. GENETIC PROGRAMMING (EUROGP 2015), 2015, 9025 : 153 - 166
  • [6] Butler-Yeoman T, 2015, IEEE C EVOL COMPUTAT, P2428, DOI 10.1109/CEC.2015.7257186
  • [7] Cervante L, 2012, IEEE C EVOL COMPUTAT
  • [8] Improved binary PSO for feature selection using gene expression data
    Chuang, Li-Yeh
    Chang, Hsueh-Wei
    Tu, Chung-Jui
    Yang, Cheng-Hong
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2008, 32 (01) : 29 - 38
  • [9] Dash M., 1997, Intelligent Data Analysis, V1
  • [10] Dash M, 2000, LECT NOTES ARTIF INT, V1805, P98