Framework for the Ensemble of Feature Selection Methods

被引：25

作者：

Mera-Gaona, Maritza ^{[1
]}

Lopez, Diego M. ^{[1
]}

Vargas-Canas, Rubiel ^{[1
]}

Neumann, Ursula ^{[2
]}

机构：

[1] Univ Cauca, Fac Elect Engn & Telecommun, Campus Tulcan, Popayan 190001, Colombia

[2] Fraunhofer Inst Integrated Circuits IIS, Fraunhofer IIS, Div Supply Chain Serv SCS, Grp Data Sci, D-90411 Nurnberg, Germany

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期

关键词：

feature selection; variable elimination; relevant features; consensus; ensemble learning; ensemble feature selection; RELEVANT FEATURES; DIMENSION REDUCTION; CLASSIFICATION; OPTIMIZATION; INFORMATION; ALGORITHM;

D O I：

10.3390/app11178122

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Feature selection (FS) has attracted the attention of many researchers in the last few years due to the increasing sizes of datasets, which contain hundreds or thousands of columns (features). Typically, not all columns represent relevant values. Consequently, the noise or irrelevant columns could confuse the algorithms, leading to a weak performance of machine learning models. Different FS algorithms have been proposed to analyze highly dimensional datasets and determine their subsets of relevant features to overcome this problem. However, very often, FS algorithms are biased by the data. Thus, methods for ensemble feature selection (EFS) algorithms have become an alternative to integrate the advantages of single FS algorithms and compensate for their disadvantages. The objective of this research is to propose a conceptual and implementation framework to understand the main concepts and relationships in the process of aggregating FS algorithms and to demonstrate how to address FS on datasets with high dimensionality. The proposed conceptual framework is validated by deriving an implementation framework, which incorporates a set of Phyton packages with functionalities to support the assembly of feature selection algorithms. The performance of the implementation framework was demonstrated in several experiments discovering relevant features in the Sonar, SPECTF, and WDBC datasets. The experiments contrasted the accuracy of two machine learning classifiers (decision tree and logistic regression), trained with subsets of features generated either by single FS algorithms or the set of features selected by the ensemble feature selection framework. We observed that for the three datasets used (Sonar, SPECTF, and WD), the highest precision percentages (86.95%, 74.73%, and 93.85%, respectively) were obtained when the classifiers were trained with the subset of features generated by our framework. Additionally, the stability of the feature sets generated using our ensemble method was evaluated. The results showed that the method achieved perfect stability for the three datasets used in the evaluation.

引用

页数：16

共 50 条

[1] An Ensemble Feature Selection Framework Integrating Stability
Zhang, Xiaokang
Jonassen, Inge
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2792 - 2798
[2] Ensemble Feature Selection Framework for Paddy Yield Prediction in Cauvery Basin using Machine Learning Classifiers
Sathya, P.
Gnanasekaran, P.
COGENT ENGINEERING, 2023, 10 (02):
[3] Ensemble Feature Selection for Heart Disease Classification
Benhar, Houda
Idri, Ali
Hosni, Mohamed
HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 369 - 376
[4] Classifier ensemble methods in feature selection
Kiziloz, Hakan Ezgi
NEUROCOMPUTING, 2021, 419 : 97 - 107
[5] Empirical Evaluation of the Ensemble Framework for Feature Selection in DDoS Attack
Das, Saikat
Venugopal, Deepak
Shiva, Sajjan
Sheldon, Frederick T.
2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 56 - 61
[6] Unsupervised feature selection with ensemble learning
Elghazel, Haytham
Aussem, Alex
MACHINE LEARNING, 2015, 98 (1-2) : 157 - 180
[7] A parallel metaheuristic approach for ensemble feature selection based on multi-core architectures
Hijazi, Neveen Mohammed
Faris, Hossam
Aljarah, Ibrahim
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 182
[8] Bi-level ensemble method for unsupervised feature selection
Zhou, Peng
Wang, Xia
Du, Liang
INFORMATION FUSION, 2023, 100
[9] Training error and sensitivity-based ensemble feature selection
Ng, Wing W. Y.
Tuo, Yuxi
Zhang, Jianjun
Kwong, Sam
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (10) : 2313 - 2326
[10] Explainable feature selection and ensemble classification via feature polarity
Zhou, Peng
Liang, Ji
Yan, Yuanting
Zhao, Shu
Wu, Xindong
INFORMATION SCIENCES, 2024, 676

← 1 2 3 4 5 →