Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm

被引:77
作者
Tawhid, Mohamed A. [1 ]
Ibrahim, Abdelmonem M. [1 ,2 ]
机构
[1] Thompson Rivers Univ, Fac Sci, Dept Math, Stat, Kamloops, BC V2C 0C8, Canada
[2] Al Azhar Univ, Fac Sci, Dept Math, Assiut Branch, Assiut, Egypt
基金
加拿大自然科学与工程研究理事会;
关键词
Feature selection; Classification; Whale optimization algorithm; Rough set theory; Wrapper approach; Logistic regression; ATTRIBUTE REDUCTION; DIFFERENTIAL EVOLUTION; SEARCH; CLASSIFICATION; CANCER; SVM;
D O I
10.1007/s13042-019-00996-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The principle of any approach for solving feature selection problem is to find a subset of the original features. Since finding a minimal subset of the features is an NP-hard problem, it is necessary to develop and propose practical and efficient heuristic algorithms. The whale optimization algorithm is a recently developed nature-inspired meta-heuristic optimization algorithm that imitates the hunting behavior of humpback whales to solve continuous optimization problems. In this paper, we propose a novel binary whale optimization algorithm (BWOA) to solve feature selection problem. BWOA is especially desirable and appealing for feature selection problem whenever there is no heuristic information that can lead the search to the optimal minimal subset. Nonetheless, whales can find the best features as they hunt the prey. Rough set theory (RST) is one of the effective algorithms for feature selection. We use RST with BWOA as the first experiment, and in the second experiment, we use a wrapper approach with BWOA on three different classifiers for feature selection. Also, we verify the performance and the effectiveness of the proposed algorithm by performing our experiments using 32 datasets from the UCI machine learning repository and comparing the proposed algorithm with some powerful existing algorithms in the literature. Furthermore, we employ two nonparametric statistical tests, Wilcoxon Signed-Rank test, and Friedman test, at 5% significance level. Our results show that the proposed algorithm can provide an efficient tool to find a minimal subset of the features.
引用
收藏
页码:573 / 602
页数:30
相关论文
共 79 条
[21]  
Fodor I.K., 2002, UCRL-ID-148494, DOI DOI 10.2172/15002155
[22]   Eliminating redundancy and irrelevance using a new MLP-based feature selection method [J].
Gasca, E ;
Sánchez, JS ;
Alonso, R .
PATTERN RECOGNITION, 2006, 39 (02) :313-315
[23]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[24]  
Hall M. A., 1999, Proceedings of the Twelfth International Florida AI Research Society Conference, P235
[25]  
Hastie T., 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, V2nd ed., DOI DOI 10.1007/B94608
[26]  
Hedar A.R., 2018, P GEN EV COMP C COMP, P1394, DOI [10.1145/3205651.3208286, DOI 10.1145/3205651.3208286]
[27]   Tabu search for attribute reduction in rough set theory [J].
Hedar, Abdel-Rahman ;
Wang, Jue ;
Fukushima, Masao .
SOFT COMPUTING, 2008, 12 (09) :909-918
[28]  
Hosmer DW Jr, 2013, WILEY SER PROBAB ST, P1
[29]   A hybrid genetic algorithm for feature selection wrapper based on mutual information [J].
Huang, Jinjie ;
Cai, Yunze ;
Xu, Xiaoming .
PATTERN RECOGNITION LETTERS, 2007, 28 (13) :1825-1844
[30]  
Hwang Kyu-Baek., 2002, Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis, P167, DOI DOI 10.1007/978-1-4615-0873-1_13