Framework for the Ensemble of Feature Selection Methods

被引:25
|
作者
Mera-Gaona, Maritza [1 ]
Lopez, Diego M. [1 ]
Vargas-Canas, Rubiel [1 ]
Neumann, Ursula [2 ]
机构
[1] Univ Cauca, Fac Elect Engn & Telecommun, Campus Tulcan, Popayan 190001, Colombia
[2] Fraunhofer Inst Integrated Circuits IIS, Fraunhofer IIS, Div Supply Chain Serv SCS, Grp Data Sci, D-90411 Nurnberg, Germany
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期
关键词
feature selection; variable elimination; relevant features; consensus; ensemble learning; ensemble feature selection; RELEVANT FEATURES; DIMENSION REDUCTION; CLASSIFICATION; OPTIMIZATION; INFORMATION; ALGORITHM;
D O I
10.3390/app11178122
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Feature selection (FS) has attracted the attention of many researchers in the last few years due to the increasing sizes of datasets, which contain hundreds or thousands of columns (features). Typically, not all columns represent relevant values. Consequently, the noise or irrelevant columns could confuse the algorithms, leading to a weak performance of machine learning models. Different FS algorithms have been proposed to analyze highly dimensional datasets and determine their subsets of relevant features to overcome this problem. However, very often, FS algorithms are biased by the data. Thus, methods for ensemble feature selection (EFS) algorithms have become an alternative to integrate the advantages of single FS algorithms and compensate for their disadvantages. The objective of this research is to propose a conceptual and implementation framework to understand the main concepts and relationships in the process of aggregating FS algorithms and to demonstrate how to address FS on datasets with high dimensionality. The proposed conceptual framework is validated by deriving an implementation framework, which incorporates a set of Phyton packages with functionalities to support the assembly of feature selection algorithms. The performance of the implementation framework was demonstrated in several experiments discovering relevant features in the Sonar, SPECTF, and WDBC datasets. The experiments contrasted the accuracy of two machine learning classifiers (decision tree and logistic regression), trained with subsets of features generated either by single FS algorithms or the set of features selected by the ensemble feature selection framework. We observed that for the three datasets used (Sonar, SPECTF, and WD), the highest precision percentages (86.95%, 74.73%, and 93.85%, respectively) were obtained when the classifiers were trained with the subset of features generated by our framework. Additionally, the stability of the feature sets generated using our ensemble method was evaluated. The results showed that the method achieved perfect stability for the three datasets used in the evaluation.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] An Ensemble Feature Selection Framework Integrating Stability
    Zhang, Xiaokang
    Jonassen, Inge
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2792 - 2798
  • [2] Ensemble Feature Selection Framework for Paddy Yield Prediction in Cauvery Basin using Machine Learning Classifiers
    Sathya, P.
    Gnanasekaran, P.
    COGENT ENGINEERING, 2023, 10 (02):
  • [3] Ensemble Feature Selection for Heart Disease Classification
    Benhar, Houda
    Idri, Ali
    Hosni, Mohamed
    HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 369 - 376
  • [4] Classifier ensemble methods in feature selection
    Kiziloz, Hakan Ezgi
    NEUROCOMPUTING, 2021, 419 : 97 - 107
  • [5] Empirical Evaluation of the Ensemble Framework for Feature Selection in DDoS Attack
    Das, Saikat
    Venugopal, Deepak
    Shiva, Sajjan
    Sheldon, Frederick T.
    2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 56 - 61
  • [6] Unsupervised feature selection with ensemble learning
    Elghazel, Haytham
    Aussem, Alex
    MACHINE LEARNING, 2015, 98 (1-2) : 157 - 180
  • [7] A parallel metaheuristic approach for ensemble feature selection based on multi-core architectures
    Hijazi, Neveen Mohammed
    Faris, Hossam
    Aljarah, Ibrahim
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 182
  • [8] Bi-level ensemble method for unsupervised feature selection
    Zhou, Peng
    Wang, Xia
    Du, Liang
    INFORMATION FUSION, 2023, 100
  • [9] Training error and sensitivity-based ensemble feature selection
    Ng, Wing W. Y.
    Tuo, Yuxi
    Zhang, Jianjun
    Kwong, Sam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (10) : 2313 - 2326
  • [10] Explainable feature selection and ensemble classification via feature polarity
    Zhou, Peng
    Liang, Ji
    Yan, Yuanting
    Zhao, Shu
    Wu, Xindong
    INFORMATION SCIENCES, 2024, 676