Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion

被引:5
作者
Ben Brahim, Afef [1 ]
机构
[1] Univ Tunis, Tunis Business Sch, LARODEC, Bir El Kassaa, Tunisia
关键词
Feature selection; High dimensionality; Instance-based learning; Stability; CANCER; PREDICTION; RELEVANCE;
D O I
10.1007/s00521-020-04971-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is frequently used as a preprocessing step to data mining and is attracting growing attention due to the increasing amounts of data emerging from different domains. The large data dimensionality increases the noise and thus the error of learning algorithms. Filter methods for feature selection are specially very fast and useful for high-dimensional datasets. Existing methods focus on producing feature subsets that improve predictive performance, but they often suffer from instability. Instance-based filters, for example, are considered as one of the most effective methods that rank features based on instances neighborhood. However, as the feature weight fluctuates with the instances, small changes in training data result in a different selected subset of features. By another hand, some other filters generate stable results but lead to a modest predictive performance. The absence of a trade-off between stability and classification accuracy decreases the reliability of the feature selection results. In order to deal with this issue, we propose filter methods that improve stability of feature selection while preserving an optimal predictive accuracy and without increasing the complexity of the feature selection algorithms. The proposed approaches first use the strength of instance learning to identify initial sets of relevant features, and the advantage of aggregation techniques to increase the stability of the final set in a second stage. Two classification algorithms are used to evaluate the predictive performance of our proposed instance-based filters compared to state-of-the-art algorithms. The obtained results show the efficiency of our methods in improving both classification accuracy and feature selection stability for high-dimensional datasets.
引用
收藏
页码:1221 / 1232
页数:12
相关论文
共 41 条
[1]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[2]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]   Ensemble feature selection for high dimensional data: a new method and a comparative study [J].
Ben Brahim, Afef ;
Limam, Mohamed .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (04) :937-952
[4]   Semi supervised relevance learning for feature selection on high dimensional data [J].
Ben Brahim, Afef ;
Kalousis, Alexandros .
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, :579-584
[5]   A hybrid feature selection method based on instance learning and cooperative subset search [J].
Ben Brahim, Afef ;
Limam, Mohamed .
PATTERN RECOGNITION LETTERS, 2016, 69 :28-34
[6]  
Ben Brahim A, 2014, 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), P306, DOI 10.1109/SOCPAR.2014.7008024
[7]   Benchmark for filter methods for feature selection in high-dimensional classification data [J].
Bommert, Andrea ;
Sun, Xudong ;
Bischl, Bernd ;
Rahnenfuehrer, Joerg ;
Lang, Michel .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
[8]   Identifying distinct classes of bladder carcinoma using microarrays [J].
Dyrskjot, L ;
Thykjaer, T ;
Kruhoffer, M ;
Jensen, JL ;
Marcussen, N ;
Hamilton-Dutoit, S ;
Wolf, H ;
Orntoft, TF .
NATURE GENETICS, 2003, 33 (01) :90-96
[9]  
Feldman D, 2013, PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), P1434
[10]   High dimensional data classification and feature selection using support vector machines [J].
Ghaddar, Bissan ;
Naoum-Sawaya, Joe .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 265 (03) :993-1004