On resilient feature selection: Computational foundations of r-C-reducts

被引:23
作者
Grzegorowski, Marek [1 ]
Slezak, Dominik [1 ]
机构
[1] Univ Warsaw, Inst Informat, Ul Banacha 2, PL-02097 Warsaw, Poland
关键词
Resilient feature selection; Multivariate feature selection; Rough-set-based approximate reducts; NP-hardness; Heuristic search; CONSTRUCTION; DIAGNOSIS; RELEVANCE;
D O I
10.1016/j.ins.2019.05.041
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of feature selection is crucial for constructing prediction and classification models, resulting in their higher quality and interpretability. However, it is often neglected that some of selected features may become temporarily unavailable in a long-term timeframe, which can disable a pre-trained model and cause a big impact on business continuity. One approach is to rely on a collection of diverse feature subsets with their corresponding prediction models treated as an ensemble. Another approach is to search for feature sets with a guarantee of providing sufficient predictive power even if some of their elements are dropped. In this paper, we focus on that latter idea, referring to it as resilient feature selection. We discuss it using an example of the rough-set-based notion of approximate reduct-an irreducible subset of features providing a satisfactory level of information about the considered target variable. We study NP-hardness of the problem of finding minimal r-C-reducts, i.e., irreducible subsets of features that assure the aforementioned level expressed by means of an information-preserving criterion function C, even after disallowing arbitrary r features. We discuss opportunities of exhaustive and heuristic search of feature subsets specified in this way. The discussed idea of resilience is surely more general and one may consider it as an extension of many other, not necessarily rough-set-based feature selection methods. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:25 / 44
页数:20
相关论文
共 44 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]  
Altidor Wilker, 2012, International Journal of Business Intelligence and Data Mining, V7, P80, DOI 10.1504/IJBIDM.2012.048729
[3]  
[Anonymous], LECT NOTES COMPUT SC
[4]  
Ben Brahim Afef, 2013, 2013 International Conference on High Performance Computing & Simulation (HPCS), P151, DOI 10.1109/HPCSim.2013.6641406
[5]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[6]   Massively Parallel Feature Extraction Framework Application in Predicting Dangerous Seismic Events [J].
Cirzegorowski, Marek .
PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 :225-229
[7]   Approximations and uncertainty measures in incomplete information systems [J].
Dai, Jianhua ;
Xu, Qing .
INFORMATION SCIENCES, 2012, 198 :62-80
[8]  
Das S., 2001, P 18 INT C MACHINE L, P74
[9]   Consistency-based search in feature selection [J].
Dash, M ;
Liu, HA .
ARTIFICIAL INTELLIGENCE, 2003, 151 (1-2) :155-176
[10]   Parallelizing feature selection [J].
de Souza, Jerffeson Teixeira ;
Matwin, Stan ;
Japkowicz, Nathalie .
ALGORITHMICA, 2006, 45 (03) :433-456