Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

被引:3
作者
Susnjak, Teo [1 ]
Kerry, David [1 ]
Barczak, Andre [1 ]
Reyes, Napoleon [1 ]
Gal, Yaniv [2 ]
机构
[1] Massey Univ, Auckland, New Zealand
[2] Compac Ltd, Auckland, New Zealand
来源
AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2015年 / 9457卷
关键词
Ensemble feature selection; Dimensionality reduction; Machine learning; Classification; Data mining; Ensemble classifiers; GENE SELECTION; FILTER; CLASSIFICATION;
D O I
10.1007/978-3-319-26350-2_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accuracy of feature selection methods is affected by both the nature of the underlying datasets and the actual machine learning algorithms they are combined with. The role these factors have in the final accuracy of the classifiers is generally unknown in advance. This paper presents an ensemble-based feature selection approach that addresses this uncertainty and mitigates against the variability in the generalisation of the classifiers. The study conducts extensive experiments with combinations of three feature selection methods on nine datasets, which are trained on eight different types of machine learning algorithms. The results confirm that the ensemble based approaches to feature selection tend to produce classifiers with higher accuracies, are more reliable due to decreased variances and are thus more generalisable.
引用
收藏
页码:526 / 538
页数:13
相关论文
共 30 条
[1]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[2]   Stochastic local search for the FEATURE SET problem, with applications to microarray data [J].
Albrecht, Andreas A. .
APPLIED MATHEMATICS AND COMPUTATION, 2006, 183 (02) :1148-1164
[3]  
[Anonymous], 2007, Uci machine learning repository
[4]  
[Anonymous], 1997, ICML
[5]   Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking [J].
Bermejo, Pablo ;
de la Ossa, Luis ;
Gamez, Jose A. ;
Puerta, Jose M. .
KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) :35-44
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[8]  
Cohen W.W., 1995, P 12 INT C MACH LEAR, P115, DOI [10.1016/b978-1-55860-377-6.50023-2, DOI 10.1016/B978-1-55860-377-6.50023-2]
[9]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[10]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139