A filter feature selection for high-dimensional data

被引:9
作者
Janane, Fatima Zahra [1 ]
Ouaderhman, Tayeb [1 ]
Chamlal, Hasna [1 ]
机构
[1] Hassan II Univ, Fac Sci Ain Chock, Dept Math & Informat, Fundamental & Appl Math Lab, Km 8 Route El Jadida,BP 5366 Maarif, Casablanca 20100, Morocco
关键词
Relief; Technique for Order Preference by Similarity to Ideal Solution; feature selection; high-dimensional data; feature ranking; CLASSIFICATION; ALGORITHMS; RELIEFF;
D O I
10.1177/17483026231184171
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In a classification problem, before building a prediction model, it is very important to identify informative features rather than using tens or thousands which may penalize some learning methods and increase the risk of over-fitting. To overcome these problems, the best solution is to use feature selection. In this article, we propose a new filter method for feature selection, by combining the Relief filter algorithm and the multi-criteria decision-making method called TOPSIS (Technique for Order Preference by Similarity to Ideal Solution), we modeled the feature selection task as a multi-criteria decision problem. Exploiting the Relief methodology, a decision matrix is computed and delivered to Technique for Order Preference by Similarity to Ideal Solution in order to rank the features. The proposed method ends up giving a ranking to the features from the best to the mediocre. To evaluate the performances of the suggested approach, a simulation study including a set of experiments and case studies was conducted on three synthetic dataset scenarios. Finally, the obtained results approve the effectiveness of our proposed filter to detect the best informative features.
引用
收藏
页数:14
相关论文
共 53 条
[1]   A review of feature selection methods on synthetic data [J].
Bolon-Canedo, Veronica ;
Sanchez-Marono, Noelia ;
Alonso-Betanzos, Amparo .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) :483-519
[2]   Benchmark for filter methods for feature selection in high-dimensional classification data [J].
Bommert, Andrea ;
Sun, Xudong ;
Bischl, Bernd ;
Rahnenfuehrer, Joerg ;
Lang, Michel .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   A graph based preordonnances theoretic supervised feature selection in high dimensional data [J].
Chamlal, Hasna ;
Ouaderhman, Tayeb ;
Aaboub, Fadwa .
KNOWLEDGE-BASED SYSTEMS, 2022, 257
[5]   Feature selection in high dimensional data: A specific preordonnances-based memetic algorithm [J].
Chamlal, Hasna ;
Ouaderhman, Tayeb ;
El Mourtji, Basma .
KNOWLEDGE-BASED SYSTEMS, 2023, 266
[6]   A hybrid feature selection approach for Microarray datasets using graph theoretic-based method [J].
Chamlal, Hasna ;
Ouaderhman, Tayeb ;
Rebbah, Fatima Ezzahra .
INFORMATION SCIENCES, 2022, 615 :449-474
[7]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[8]   A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification [J].
Chaudhuri, Abhilasha ;
Sahu, Tirath Prasad .
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
[9]  
Chikhi Salim, 2009, International Journal of Business Intelligence and Data Mining, V4, P375, DOI 10.1504/IJBIDM.2009.029085
[10]  
Clerc M., 2010, P ICNN 95 INT C NEUR, V93, DOI DOI 10.1109/ICNN.1995.488968