On resilient feature selection: Computational foundations of r-C-reducts

被引：23

作者：

Grzegorowski, Marek ^{[1
]}

Slezak, Dominik ^{[1
]}

机构：

[1] Univ Warsaw, Inst Informat, Ul Banacha 2, PL-02097 Warsaw, Poland

来源：

INFORMATION SCIENCES | 2019年 / 499卷

关键词：

Resilient feature selection; Multivariate feature selection; Rough-set-based approximate reducts; NP-hardness; Heuristic search; CONSTRUCTION; DIAGNOSIS; RELEVANCE;

D O I：

10.1016/j.ins.2019.05.041

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The task of feature selection is crucial for constructing prediction and classification models, resulting in their higher quality and interpretability. However, it is often neglected that some of selected features may become temporarily unavailable in a long-term timeframe, which can disable a pre-trained model and cause a big impact on business continuity. One approach is to rely on a collection of diverse feature subsets with their corresponding prediction models treated as an ensemble. Another approach is to search for feature sets with a guarantee of providing sufficient predictive power even if some of their elements are dropped. In this paper, we focus on that latter idea, referring to it as resilient feature selection. We discuss it using an example of the rough-set-based notion of approximate reduct-an irreducible subset of features providing a satisfactory level of information about the considered target variable. We study NP-hardness of the problem of finding minimal r-C-reducts, i.e., irreducible subsets of features that assure the aforementioned level expressed by means of an information-preserving criterion function C, even after disallowing arbitrary r features. We discuss opportunities of exhaustive and heuristic search of feature subsets specified in this way. The discussed idea of resilience is surely more general and one may consider it as an extension of many other, not necessarily rough-set-based feature selection methods. (C) 2019 Elsevier Inc. All rights reserved.

引用

页码：25 / 44

页数：20

共 44 条

[1] Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].