Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures

被引:155
作者
Nicodemus, Kristin K. [1 ]
机构
[1] Univ Oxford, Dept Anat Physiol & Genet, MRC Funct Genom Unit, Oxford OX1 3QX, England
基金
英国惠康基金;
关键词
Random forest; variable importance measures; stability; ranking; correlation; linkage disequilibrium;
D O I
10.1093/bib/bbr016
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A recent study examined the stability of rankings from random forests using two variable importance measures (mean decrease accuracy (MDA) and mean decrease Gini (MDG)) and concluded that rankings based on the MDG were more robust than MDA. However, studies examining data-specific characteristics on ranking stability have been few. Rankings based on the MDG measure showed sensitivity to within-predictor correlation and differences in category frequencies, even when the number of categories was held constant, and thus may produce spurious results. The MDA measure was robust to these data characteristics. Further, under strong within-predictor correlation, MDG rankings were less stable than those using MDA.
引用
收藏
页码:369 / 373
页数:5
相关论文
共 9 条
[1]   Stability and aggregation of ranked gene lists [J].
Boulesteix, Anne-Laure ;
Slawski, Martin .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) :556-568
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
Calle M Luz, 2011, Brief Bioinform, V12, P86, DOI 10.1093/bib/bbq011
[4]  
LEISCH F, 2008, WORKING PAPER SERIES
[5]   Performance of random forest when SNPs are in linkage disequilibrium [J].
Meng, Yan A. ;
Yu, Yi ;
Cupples, L. Adrienne ;
Farrer, Lindsay A. ;
Lunetta, Kathryn L. .
BMC BIOINFORMATICS, 2009, 10
[6]   The behaviour of random forest permutation-based variable importance measures under predictor correlation [J].
Nicodemus, Kristin K. ;
Malley, James D. ;
Strobl, Carolin ;
Ziegler, Andreas .
BMC BIOINFORMATICS, 2010, 11
[7]   Predictor correlation impacts machine learning algorithms: implications for genomic studies [J].
Nicodemus, Kristin K. ;
Malley, James D. .
BIOINFORMATICS, 2009, 25 (15) :1884-1890
[8]   Bias in random forest variable importance measures: Illustrations, sources and a solution [J].
Strobl, Carolin ;
Boulesteix, Anne-Laure ;
Zeileis, Achim ;
Hothorn, Torsten .
BMC BIOINFORMATICS, 2007, 8 (1)
[9]   Conditional variable importance for random forests [J].
Strobl, Carolin ;
Boulesteix, Anne-Laure ;
Kneib, Thomas ;
Augustin, Thomas ;
Zeileis, Achim .
BMC BIOINFORMATICS, 2008, 9 (1)