A random forest guided tour

被引:2350
作者
Biau, Gerard [1 ,2 ]
Scornet, Erwan [1 ]
机构
[1] Univ Paris 06, Sorbonne Univ, CNRS, LSTA, Boite 158,4 Pl Jussieu, F-75005 Paris, France
[2] Inst Univ France, Paris, France
关键词
Random forests; Randomization; Resampling; Parameter tuning; Variable importance; MACHINE LEARNING-METHODS; PROBABILITY ESTIMATION; VARIABLE IMPORTANCE; REGRESSION TREES; GENE SELECTION; CLASSIFICATION; RANKING; CONVERGENCE; CONSISTENCY; RATES;
D O I
10.1007/s11749-016-0481-7
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
引用
收藏
页码:197 / 227
页数:31
相关论文
共 100 条
[31]   On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification [J].
Biau, Gerard ;
Devroye, Luc .
JOURNAL OF MULTIVARIATE ANALYSIS, 2010, 101 (10) :2499-2518
[32]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[33]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[34]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507
[35]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[36]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[37]   Randomizing outputs to increase prediction accuracy [J].
Breiman, L .
MACHINE LEARNING, 2000, 40 (03) :229-242
[38]  
Breiman L., 2003, SETTING USING UNDERS
[39]  
Bühlmann P, 2002, ANN STAT, V30, P927
[40]  
Clémençon S, 2013, J MACH LEARN RES, V14, P39