Variance reduction in purely random forests

被引:53
作者
Genuer, Robin [1 ,2 ]
机构
[1] Univ Bordeaux, ISPED, Ctr INSERM Epidemiol Biostat U897, F-33000 Bordeaux, France
[2] INSERM, ISPED, Ctr INSERM Epidemiol Biostat U897, F-33000 Bordeaux, France
关键词
random forests; nonparametric regression; rates of convergence; randomisation; ensemble methods;
D O I
10.1080/10485252.2012.677843
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Random forests (RFs), introduced by Leo Breiman in 2001, are a very effective statistical method. The complex mechanism of the method makes theoretical analysis difficult. Therefore, simplified versions of RF, called purely RFs (PRF), which can be theoretically handled more easily, have been considered. In this paper, we study the variance of such forests. First, we show a general upper bound which emphasises the fact that a forest reduces the variance. We then introduce a simple variant of PRFs, that we call purely uniformly RFs. For this variant and in the context of regression problems with a one-dimensional predictor space, we show that both random trees and RFs reach minimax rate of convergence. In addition, we prove that compared with random trees, RFs improve accuracy by reducing the estimator variance by a factor of three-fourths.
引用
收藏
页码:543 / 562
页数:20
相关论文
共 12 条
[1]  
[Anonymous], 2001, Computing Science and Statistics
[2]  
ARLOT S, 2008, ARXIV08020566V2
[3]  
Biau G., 2012, J MACHINE L IN PRESS
[4]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
David H. A., 2003, ORDER STAT
[7]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[8]   Variable selection using random forests [J].
Genuer, Robin ;
Poggi, Jean-Michel ;
Tuleau-Malot, Christine .
PATTERN RECOGNITION LETTERS, 2010, 31 (14) :2225-2236
[9]   An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings [J].
Goldstein, Benjamin A. ;
Hubbard, Alan E. ;
Cutler, Adele ;
Barcellos, Lisa F. .
BMC GENETICS, 2010, 11
[10]  
Graham R.L., 1989, Concrete Mathematics