Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?

被引：1

作者：

Moran-Fernandez, Laura ^{[1
]}

Bolon-Canedo, Veronica ^{[1
]}

机构：

[1] Univ A Coruna, CITIC, La Coruna, Spain

来源：

ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I | 2021年 / 12861卷

关键词：

Dimensionality reduction; Feature selection; Filters; Classification; CLASSIFIERS;

D O I：

10.1007/978-3-030-85030-2_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The advent of Big Data has brought with it an unprecedented and overwhelming increase in data volume, not only in samples but also in available features. Feature selection, the process of selecting the relevant features and discarding the irrelevant ones, has been successfully applied over the last decades to reduce the dimensionality of the datasets. However, there is a great number of feature selection methods available in the literature, and choosing the right one for a given problem is not a trivial decision. In this paper we will try to determine which of the multiple methods in the literature are the best suited for a particular type of problem, and study their effectiveness when comparing them with a random selection. In our experiments we will use an extensive number of datasets that allow us to work on a wide variety of problems from the real world that need to be dealt with in this field. Seven popular feature selection methods were used, as well as five different classifiers to evaluate their performance. The experimental results suggest that feature selection is, in general, a powerful tool in machine learning, being correlation-based feature selection the best option with independence of the scenario. Also, we found out that the choice of an inappropriate threshold when using ranker methods leads to results as poor as when randomly selecting a subset of features.

引用

页码：113 / 125

页数：13

共 23 条

[1]

Bache K., UCI machine learning repository

[2]

Benavoli A, 2017, J MACH LEARN RES, V18

[3] A review of microarray datasets and applied feature selection methods [J].