A New Permutation-Based Method for Ranking and Selecting Group Features in Multiclass Classification

被引:1
作者
Zubair, Iqbal Muhammad [1 ]
Lee, Yung-Seop [2 ]
Kim, Byunghoon [1 ]
机构
[1] Hanyang Univ, Dept Ind & Management Engn, Ansan 15588, South Korea
[2] Dongguk Univ, Dept Stat, Seoul 04620, South Korea
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 08期
基金
新加坡国家研究基金会;
关键词
group feature; feature selection; permutation; multiclass classification; ALGORITHMS; GENES;
D O I
10.3390/app14083156
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The selection of group features is a critical aspect in reducing model complexity by choosing the most essential group features, while eliminating the less significant ones. The existing group feature selection methods select a set of important group features, without providing the relative importance of all group features. Moreover, few methods consider the relative importance of group features in the selection process. This study introduces a permutation-based group feature selection approach specifically designed for high-dimensional multiclass datasets. Initially, the least absolute shrinkage and selection operator (lasso) method was applied to eliminate irrelevant individual features within each group feature. Subsequently, the relative importance of the group features was computed using a random-forest-based permutation method. Accordingly, the process selected the highly significant group features. The performance of the proposed method was evaluated using machine learning algorithms and compared with the performance of other approaches, such as group lasso. We used real-world, high-dimensional, multiclass microarray datasets to demonstrate its effectiveness. The results highlighted the capability of the proposed method, which not only selected significant group features but also provided the relative importance and ranking of all group features. Furthermore, the proposed method outperformed the existing method in terms of accuracy and F1 score.
引用
收藏
页数:16
相关论文
共 46 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]  
BAKIN S., 1999, Adaptive regression and model selection in data mining problems
[3]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135
[4]   A review of feature selection methods on synthetic data [J].
Bolon-Canedo, Veronica ;
Sanchez-Marono, Noelia ;
Alonso-Betanzos, Amparo .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) :483-519
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments [J].
Breitling, R ;
Armengaud, P ;
Amtmann, A ;
Herzyk, P .
FEBS LETTERS, 2004, 573 (1-3) :83-92
[7]   Random forest kernel for high-dimension low sample size classification [J].
Cavalheiro, Lucca Portes ;
Bernard, Simon ;
Barddal, Jean Paul ;
Heutte, Laurent .
STATISTICS AND COMPUTING, 2024, 34 (01)
[8]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[9]   Feature selection for text classification with Naive Bayes [J].
Chen, Jingnian ;
Huang, Houkuan ;
Tian, Shengfeng ;
Qu, Youli .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5432-5435
[10]   PCMIgr: a fast packet classification method based on information gain ratio [J].
Cheng, Yuzhu ;
Shi, Qiuying .
JOURNAL OF SUPERCOMPUTING, 2023, 79 (07) :7414-7437