A Novel Approach to Increase the Efficiency of Filter-Based Feature Selection Methods in High-Dimensional Datasets With Strong Correlation Structure

被引:6
作者
Akogul, Serkan [1 ]
机构
[1] Pamukkale Univ, Fac Sci, Dept Stat, TR-20160 Denizli, Turkiye
关键词
Feature selection; filter feature selection; Gaussian mixture model (GMM); Gaussian mixture discriminant analysis (GMDA); DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; CLASSIFICATION;
D O I
10.1109/ACCESS.2023.3325331
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, data dimensions have increased depending on the developments in information and measurement technologies. Due to the high dimensionality, it is necessary to use pre-analysis data reduction methods for many analyzes such as classification and regression analysis. In the solution of high-dimensionality, filter feature selection methods based on statistical criteria are widely used in terms of simplicity and efficiency. One of the important problems with filter feature selection methods is the selection of multiple features carrying the same information unnecessarily when strong correlations exist between features. In this study, a novel approach is proposed to solve this problem of filter feature selection methods. In addition, with the proposed new approach, the question of how many appropriate features will be included is also solved. The performance of the proposed approach is demonstrated on high-dimensional reflectance data with high correlations between features. The results obtained revealed that the proposed approach improves the classification performance of filter feature selection methods in mixture discriminant analysis in terms of classification accuracy and entropy criteria.
引用
收藏
页码:115025 / 115032
页数:8
相关论文
共 55 条
[1]   An Approach for Determining the Number of Clusters in a Model-Based Cluster Analysis [J].
Akogul, Serkan ;
Erisoglu, Murat .
ENTROPY, 2017, 19 (09)
[2]  
Ammu P.K., 2013, International Journal of Computer Applications, V61, P39
[3]  
[Anonymous], 2016, Yugoslav Journal of Operations Research
[4]  
[Anonymous], 2007, P 15 INT C MULT MULT
[5]   Overview and comparative study of dimensionality reduction techniques for high dimensional data [J].
Ayesha, Shaeela ;
Hanif, Muhammad Kashif ;
Talib, Ramzan .
INFORMATION FUSION, 2020, 59 :44-58
[6]   Ensemble feature selection for high dimensional data: a new method and a comparative study [J].
Ben Brahim, Afef ;
Limam, Mohamed .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (04) :937-952
[7]   A random forest guided tour [J].
Biau, Gerard ;
Scornet, Erwan .
TEST, 2016, 25 (02) :197-227
[8]  
Biesiada J, 2007, ADV INTEL SOFT COMPU, V45, P242
[9]   Feature selection for high-dimensional data [J].
Bolón-Canedo V. ;
Sánchez-Maroño N. ;
Alonso-Betanzos A. .
Progress in Artificial Intelligence, 2016, 5 (02) :65-75
[10]   Model-based clustering, classification, and discriminant analysis of data with mixed type [J].
Browne, Ryan P. ;
McNicholas, Paul D. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (11) :2976-2984