A Hybrid Embedded-Filter Method for Improving Feature Selection Stability of Random Forests

被引:2
作者
Jerbi, Wassila [1 ]
Ben Brahim, Afef [2 ]
Essoussi, Nadia [1 ]
机构
[1] Univ Tunis, Inst Super Gest, LARODEC, Ave Liberte, Le Bardo 2000, Tunisia
[2] Univ Tunis, Tunis Business Sch, LARODEC, El Mourouj 2074, Tunisia
来源
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016) | 2017年 / 552卷
关键词
Stability; Feature selection; Classification; High dimensional data; Random forests;
D O I
10.1007/978-3-319-52941-7_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many domains deal with high dimensional data that are described with few observations compared to the large number of features. Feature selection is frequently used as a pre-processing step to make mining such data more efficient. Actually, the issue of feature selection concerns the stability which consists on the study of the sensibility of selected features to variations in the training set. Random forests are one of the classification algorithms that are also considered as embedded feature selection methods thanks to the selection that occurs in the learning algorithm. However, this method suffers from instability of selection. The purpose of our work is to investigate the classification and feature selection properties of Random Forests. We will have a particular focus on enhancing stability of this algorithm as an embedded feature selection method. A hybrid filter-embedded version of this algorithm is proposed and results show its efficiency.
引用
收藏
页码:370 / 379
页数:10
相关论文
共 13 条
  • [1] Ali J., 2012, INT J COMPUT SCI ISS, V9, P1
  • [2] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [3] A hybrid feature selection method based on instance learning and cooperative subset search
    Ben Brahim, Afef
    Limam, Mohamed
    [J]. PATTERN RECOGNITION LETTERS, 2016, 69 : 28 - 34
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Ensemble methods in machine learning
    Dietterich, TG
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 : 1 - 15
  • [6] Identifying distinct classes of bladder carcinoma using microarrays
    Dyrskjot, L
    Thykjaer, T
    Kruhoffer, M
    Jensen, JL
    Marcussen, N
    Hamilton-Dutoit, S
    Wolf, H
    Orntoft, TF
    [J]. NATURE GENETICS, 2003, 33 (01) : 90 - 96
  • [7] Guyon I, 2003, J MACH LEARN RES, V3, P1157, DOI DOI 10.1162/153244303322753616
  • [8] Han J., 2006, Data mining: Concepts and Techniques
  • [9] Stability of feature selection algorithms: a study on high-dimensional spaces
    Kalousis, Alexandros
    Prados, Julien
    Hilario, Melanie
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 95 - 116
  • [10] Transcriptomic profiling provides molecular insights into hydrogen peroxide-induced adventitious rooting in mung bean seedlings
    Li, Shi-Weng
    Leng, Yan
    Shi, Rui-Fang
    [J]. BMC GENOMICS, 2017, 18