EFS: an ensemble feature selection tool implemented as R-package and web-application

被引:69
作者
Neumann, Ursula [1 ,2 ,3 ]
Genze, Nikita [1 ,2 ]
Heider, Dominik [1 ,2 ,3 ]
机构
[1] Straubing Ctr Sci, Schulgasse 22, D-94315 Straubing, Germany
[2] Univ Appl Sci, D-85354 Freising Weihenstephan, Germany
[3] Tech Univ Munich, Wissensch Zentrum Weihenstephan, D-85354 Freising Weihenstephan, Germany
关键词
Machine learning; Feature selection; Ensemble learning; R-package; CLASSIFICATION; PERFORMANCE;
D O I
10.1186/s13040-017-0142-8
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. Results: The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. Conclusion: EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. Availability: EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.
引用
收藏
页数:9
相关论文
共 20 条
[1]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[2]  
Bache K., 2013, UCI Machine Learning Repository
[3]   On the limits of computational functional genomics for bacterial lifestyle prediction [J].
Barbosa, Eudes ;
Roettger, Richard ;
Hauschild, Anne-Christin ;
Azevedo, Vasco ;
Baumbach, Jan .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2014, 13 (05) :398-408
[4]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[7]   Structure of HIV-1 quasi-species as early indicator for switches of co-receptor tropism [J].
Dybowski, J. Nikolaj ;
Heider, Dominik ;
Hoffmann, Daniel .
AIDS RESEARCH AND THERAPY, 2010, 7
[8]   Stable feature selection for biomarker discovery [J].
He, Zengyou ;
Yu, Weichuan .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) :215-225
[9]   Feature selection: Evaluation, application, and small sample performance [J].
Jain, A ;
Zongker, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (02) :153-158
[10]   DIFFERENTIAL MORTALITY - SOME COMPARISONS BETWEEN ENGLAND AND WALES, FINLAND AND FRANCE, BASED ON INEQUALITY MEASURES [J].
LECLERC, A ;
LERT, F ;
FABIEN, C .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 1990, 19 (04) :1001-1010