A novel community detection based genetic algorithm for feature selection

被引:97
作者
Rostami, Mehrdad [1 ]
Berahmand, Kamal [2 ]
Forouzandeh, Saman [3 ]
机构
[1] Univ Kurdistan, Dept Comp Engn, Sanandaj, Iran
[2] Queensland Univ Technol, Dept Sci & Engn, Brisbane, Qld, Australia
[3] Univ Appl Sci & Technol, Ctr Tehran Municipal, ICT Org, Dept Comp Engn, Tehran, Iran
关键词
Machine learning; Feature selection; Genetic algorithm; Graph theory; Multi-objective; PARTICLE SWARM OPTIMIZATION; MUTUAL INFORMATION; CLASSIFICATION; SCHEME;
D O I
10.1186/s40537-020-00398-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The feature selection is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features as much as possible from high-dimensional datasets. Among the main disadvantages of present meta-heuristic based approaches is that they are often neglecting the correlation between a set of selected features. In this article, for the purpose of feature selection, the authors propose a genetic algorithm based on community detection, which functions in three steps. The feature similarities are calculated in the first step. The features are classified by community detection algorithms into clusters throughout the second step. In the third step, features are picked by a genetic algorithm with a new community-based repair operation. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach. Also, the authors have compared the efficiency of the proposed approach with the findings from four available algorithms for feature selection. Comparing the performance of the proposed method with three new feature selection methods based on PSO, ACO, and ABC algorithms on three classifiers showed that the accuracy of the proposed method is on average 0.52% higher than the PSO, 1.20% higher than ACO, and 1.57 higher than the ABC algorithm.
引用
收藏
页数:27
相关论文
共 83 条
[1]   Feature selection for classification models via bilevel optimization [J].
Agor, Joseph ;
Ozaltin, Osman Y. .
COMPUTERS & OPERATIONS RESEARCH, 2019, 106 (156-168) :156-168
[2]   Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification [J].
Alshamlan, Hala M. ;
Badr, Ghada H. ;
Alohali, Yousef A. .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2015, 56 :49-60
[3]   A two-layer feature selection method using Genetic Algorithm and Elastic Net [J].
Amini, Fatemeh ;
Hu, Guiping .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
[4]   Feature Selection using K-Means Genetic Algorithm for Multi-objective Optimization [J].
Anusha, M. ;
Sathiaseelan, J. G. R. .
3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 :1074-1080
[5]  
Arowolo MO, 2017, EAI ENDORSED TRANS S, V4, DOI 10.4108/eai.25-9-2017.153147
[6]  
Arowolo M.O, 2020, 2020 INT C MATH COMP, P1, DOI [10.1109/ICMCECS47690.2020.240881, DOI 10.1109/ICMCECS47690.2020.240881]
[7]   A Hybrid Heuristic Dimensionality Reduction Methods for Classifying Malaria Vector Gene Expression Data [J].
Arowolo, Micheal O. ;
Adebiyi, Marion Olubunmi ;
Adebiyi, Ayodele Ariyo ;
Okesola, Olatunji Julius .
IEEE ACCESS, 2020, 8 :182422-182430
[8]   Multi Hive Artificial Bee Colony Programming for high dimensional symbolic regression with feature selection [J].
Arslan, Sibel ;
Ozturk, Celal .
APPLIED SOFT COMPUTING, 2019, 78 :515-527
[9]   Fast graph clustering with a new description model for community detection [J].
Bai, Liang ;
Cheng, Xueqi ;
Liang, Jiye ;
Guo, Yike .
INFORMATION SCIENCES, 2017, 388 :37-47
[10]   Wrapper ANFIS-ICA method to do stock market timing and feature selection on the basis of Japanese Candlestick [J].
Barak, Sasan ;
Dahooie, Jalil Heidary ;
Tichy, Tomas .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) :9221-9235