RENT-Repeated Elastic Net Technique for Feature Selection

被引:17
作者
Jenul, Anna [1 ]
Schrunner, Stefan [1 ]
Liland, Kristian Hovde [1 ]
Indahl, Ulf Geir [1 ]
Futsaether, Cecilia Marie [1 ]
Tomic, Oliver [1 ]
机构
[1] Norwegian Univ Life Sci, Fac Sci & Technol, N-1430 As, Norway
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Feature extraction; Stability criteria; Predictive models; Training; Training data; Task analysis; Data models; Elastic net regularization; exploratory analysis; ensemble feature selection; generalized linear models; selection stability;
D O I
10.1109/ACCESS.2021.3126429
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is an essential step in data science pipelines to reduce the complexity associated with large datasets. While much research on this topic focuses on optimizing predictive performance, few studies investigate stability in the context of the feature selection process. In this study, we present the Repeated Elastic Net Technique (RENT) for Feature Selection. RENT uses an ensemble of generalized linear models with elastic net regularization, each trained on distinct subsets of the training data. The feature selection is based on three criteria evaluating the weight distributions of features across all elementary models. This fact leads to the selection of features with high stability that improve the robustness of the final model. Furthermore, unlike established feature selectors, RENT provides valuable information for model interpretation concerning the identification of objects in the data that are difficult to predict during training. In our experiments, we benchmark RENT against six established feature selectors on eight multivariate datasets for binary classification and regression. In the experimental comparison, RENT shows a well-balanced trade-off between predictive performance and stability. Finally, we underline the additional interpretational value of RENT with an exploratory post-hoc analysis of a healthcare dataset.
引用
收藏
页码:152333 / 152346
页数:14
相关论文
共 52 条
[1]   A Novel Framework of Two Successive Feature Selection Levels Using Weight-Based Procedure for Voice-Loss Detection in Parkinsons Disease [J].
Ashour, Amira S. ;
Nour, Majid Kamal A. ;
Polat, Kemal ;
Guo, Yanhui ;
Alsaggaf, Wafaa ;
El-Attar, Amira .
IEEE ACCESS, 2020, 8 :76193-76203
[2]   Particle Swarm Optimization based Two-Stage Feature Selection in Text Mining [J].
Bai, Xiaohan ;
Gao, Xiaoying ;
Xue, Bing .
2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, :989-996
[3]  
Bishop CM, 1995, NEURAL NETWORKS PATT, P477
[4]   A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data [J].
Bommert, Andrea ;
Rahnenfuehrer, Joerg ;
Lang, Michel .
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2017, 2017
[5]   Predicting survival from microarray data -: a comparative study [J].
Bovelstad, H. M. ;
Nygard, S. ;
Storvold, H. L. ;
Aldrin, M. ;
Borgan, O. ;
Frigessi, A. ;
Lingjaerde, O. C. .
BIOINFORMATICS, 2007, 23 (16) :2080-2087
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Brownlee J., 2011, Clever algorithms: Nature-inspired programming recipes
[8]   Multimodel inference - understanding AIC and BIC in model selection [J].
Burnham, KP ;
Anderson, DR .
SOCIOLOGICAL METHODS & RESEARCH, 2004, 33 (02) :261-304
[9]  
Calle M Luz, 2011, Brief Bioinform, V12, P86, DOI 10.1093/bib/bbq011
[10]   Supervised Feature Selection With a Stratified Feature Weighting Method [J].
Chen, Renjie ;
Sun, Ning ;
Chen, Xiaojun ;
Yang, Min ;
Wu, Qingyao .
IEEE ACCESS, 2018, 6 :15087-15098