Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

被引:1
作者
Moran-Fernandez, Laura [1 ]
Bolon-Canedo, Veronica [1 ]
机构
[1] Univ A Coruna, CITIC, La Coruna, Spain
来源
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I | 2021年 / 12861卷
关键词
Dimensionality reduction; Feature selection; Filters; Classification; CLASSIFIERS;
D O I
10.1007/978-3-030-85030-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advent of Big Data has brought with it an unprecedented and overwhelming increase in data volume, not only in samples but also in available features. Feature selection, the process of selecting the relevant features and discarding the irrelevant ones, has been successfully applied over the last decades to reduce the dimensionality of the datasets. However, there is a great number of feature selection methods available in the literature, and choosing the right one for a given problem is not a trivial decision. In this paper we will try to determine which of the multiple methods in the literature are the best suited for a particular type of problem, and study their effectiveness when comparing them with a random selection. In our experiments we will use an extensive number of datasets that allow us to work on a wide variety of problems from the real world that need to be dealt with in this field. Seven popular feature selection methods were used, as well as five different classifiers to evaluate their performance. The experimental results suggest that feature selection is, in general, a powerful tool in machine learning, being correlation-based feature selection the best option with independence of the scenario. Also, we found out that the choice of an inappropriate threshold when using ranker methods leads to results as poor as when randomly selecting a subset of features.
引用
收藏
页码:113 / 125
页数:13
相关论文
共 23 条
[1]  
Bache K., UCI machine learning repository
[2]  
Benavoli A, 2017, J MACH LEARN RES, V18
[3]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135
[4]  
Bolon-Canedo V., 2020, EUR S ART NEUR NETW, P399
[5]   A review of feature selection methods on synthetic data [J].
Bolon-Canedo, Veronica ;
Sanchez-Marono, Noelia ;
Alonso-Betanzos, Amparo .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) :483-519
[6]   Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data [J].
Climente-Gonzalez, Hector ;
Azencott, Chloe-Agathe ;
Kaski, Samuel ;
Yamada, Makoto .
BIOINFORMATICS, 2019, 35 (14) :I427-I435
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]  
Fernández-Delgado M, 2014, J MACH LEARN RES, V15, P3133
[9]  
Furxhi I., 2020, NANOTOXICOLOGY, P1, DOI DOI 10.1080/17435390.2020.1729439
[10]  
Grgic-Hlaca N., 2018, 32 AAAI C ART INT, V18, P51