A random forest guided tour

被引:2297
作者
Biau, Gerard [1 ,2 ]
Scornet, Erwan [1 ]
机构
[1] Univ Paris 06, Sorbonne Univ, CNRS, LSTA, Boite 158,4 Pl Jussieu, F-75005 Paris, France
[2] Inst Univ France, Paris, France
关键词
Random forests; Randomization; Resampling; Parameter tuning; Variable importance; MACHINE LEARNING-METHODS; PROBABILITY ESTIMATION; VARIABLE IMPORTANCE; REGRESSION TREES; GENE SELECTION; CLASSIFICATION; RANKING; CONVERGENCE; CONSISTENCY; RATES;
D O I
10.1007/s11749-016-0481-7
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
引用
收藏
页码:197 / 227
页数:31
相关论文
共 100 条
[91]   CONSISTENT NONPARAMETRIC REGRESSION [J].
STONE, CJ ;
BICKEL, PJ ;
BREIMAN, L ;
BRILLINGER, DR ;
BRUNK, HD ;
PIERCE, DA ;
CHERNOFF, H ;
COVER, TM ;
COX, DR ;
EDDY, WF ;
HAMPEL, F ;
OLSHEN, RA ;
PARZEN, E ;
ROSENBLATT, M ;
SACKS, J ;
WAHBA, G .
ANNALS OF STATISTICS, 1977, 5 (04) :595-645
[92]   Random forest: A classification and regression tool for compound classification and QSAR modeling [J].
Svetnik, V ;
Liaw, A ;
Tong, C ;
Culberson, JC ;
Sheridan, RP ;
Feuston, BP .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06) :1947-1958
[93]   Classification with correlated features: unreliability of feature ranking and solutions [J].
Tolosi, Laura ;
Lengauer, Thomas .
BIOINFORMATICS, 2011, 27 (14) :1986-1994
[94]   Big Data: New Tricks for Econometrics [J].
Varian, Hal R. .
JOURNAL OF ECONOMIC PERSPECTIVES, 2014, 28 (02) :3-27
[95]  
Wager S, 2014, J MACH LEARN RES, V15, P1625
[96]   Casting Random Forests as Artificial Neural Networks (and Profiting from It) [J].
Welbl, Johannes .
PATTERN RECOGNITION, GCPR 2014, 2014, 8753 :765-774
[97]   A Weighted Random Forests Approach to Improve Predictive Performance [J].
Winham, Stacey J. ;
Freimuth, Robert R. ;
Biernacka, Joanna M. .
STATISTICAL ANALYSIS AND DATA MINING, 2013, 6 (06) :496-505
[98]   Cluster Forests [J].
Yan, Donghui ;
Chen, Aiyou ;
Jordan, Michael I. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 66 :178-192
[99]  
Zhao Yi, 2012, 2012 Information Theory and Applications Workshop (ITA), P143, DOI 10.1109/ITA.2012.6181810
[100]   Reinforcement Learning Trees [J].
Zhu, Ruoqing ;
Zeng, Donglin ;
Kosorok, Michael R. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (512) :1770-1784