New Feature Selection Algorithm Based on Feature Stability and Correlation

被引:8
作者
Al-Shalabi, Luai [1 ]
机构
[1] Arab Open Univ, Fac Comp Studies, Al Ardia 92400, Kuwait
关键词
Feature extraction; Classification algorithms; Machine learning algorithms; Dimensionality reduction; Correlation; Filtering theory; Information filters; correlation; feature selection; stability; FILTER; CLASSIFICATION; REDUCTION; DATASETS;
D O I
10.1109/ACCESS.2022.3140209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The analysis of a large amount of data with high dimensionality of rows and columns increases the load of machine learning algorithms. Such data are likely to have noise and consequently, obstruct the performance of machine learning algorithms. Feature selection (FS) is one of the most essential machine learning techniques that can solve the above-mentioned problem. It tries to identify and eliminate irrelevant information as much as possible and only maintain a minimum subset of appropriate features. It plays an important role in improving the accuracy of machine-learning algorithms. It also reduces computational complexity, run time, storage, and cost. In this paper, a new feature selection algorithm based on feature stability and correlation is proposed to select the effective minimum subset of appropriate features. The efficiency of the proposed algorithm was evaluated by comparing it with other state-of-the-art dimensionality reduction (DR) algorithms using benchmark datasets. The evaluation criteria included the size of the minimum subset, the classification accuracy, the F-measure, and the area under curve (AUC). The results showed that the proposed algorithm is the pioneer in reducing a given dataset with high predictive accuracy.
引用
收藏
页码:4699 / 4713
页数:15
相关论文
共 58 条
[1]  
Al Shalabi L., 2006, Journal of Computer Sciences, V2, P735, DOI 10.3844/jcssp.2006.735.739
[2]  
Al Shalabi L, 2019, INT ARAB J INF TECHN, V16, P203
[3]  
[Anonymous], 2013, Biological Knowledge Discovery Handbook
[4]  
[Anonymous], 2009, Introduction to Algorithms
[5]   A Comprehensive Empirical Comparison of Modern Supervised Classification and Feature Selection Methods for Text Categorization [J].
Aphinyanaphongs, Yindalon ;
Fu, Lawrence D. ;
Li, Zhiguo ;
Peskin, Eric R. ;
Efstathiadis, Efstratios ;
Aliferis, Constantin F. ;
Statnikov, Alexander .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (10) :1964-1987
[6]   Frequency Limited & Weighted Model Reduction Algorithm With Error Bound: Application to Discrete-Time Doubly Fed Induction Generator Based Wind Turbines for Power System [J].
Bashir, Sajid ;
Imran, Muhammad ;
Batool, Sammana ;
Imran, Muhammad ;
Ahmad, Mian Ilyas ;
Malik, Fahad Mumtaz ;
Salman, Muhammad ;
Wakeel, Abdul ;
Ali, Usman .
IEEE ACCESS, 2021, 9 :9505-9534
[7]   Adapting the CMIM algorithm for multilabel feature selection. A comparison with existing methods [J].
Bermejo, Pablo ;
Gamez, Jose A. ;
Puerta, Jose M. .
EXPERT SYSTEMS, 2018, 35 (01)
[8]  
Biesiada J, 2007, ADV INTEL SOFT COMPU, V45, P242
[9]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135
[10]   A review of feature selection methods on synthetic data [J].
Bolon-Canedo, Veronica ;
Sanchez-Marono, Noelia ;
Alonso-Betanzos, Amparo .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) :483-519