Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors

被引:13
作者
Fernandez-Gonzalez, Pablo [1 ]
Bielza, Concepcion [1 ]
Larranaga, Pedro [1 ]
机构
[1] Tech Univ Madrid, Madrid 28660, Spain
关键词
Random forests; regression; bagging; bootstrap; nearest neighbors; k-potential nearest neighbors;
D O I
10.1109/ACCESS.2019.2900755
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we tackle the problem of random forests for regression expressed as weighted sums of datapoints. We study the theoretical behavior of k-potential nearest neighbors (k-PNNs) under bagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splitting criterion, provided that we use unpruned trees that stop growing only when there are k or less datapoints at their leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms) introduced in this paper, to derive the explicit expression of weights for datapoints in a random (k-PNNs) selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive other bagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression of weights of a regression estimate equivalent to a random forest regression estimate with the random splitting criterion and demonstrate its equivalence both theoretically and practically.
引用
收藏
页码:25660 / 25672
页数:13
相关论文
共 19 条
[1]   Maxima in hypercubes [J].
Bai, ZD ;
Devroye, L ;
Hwang, HK ;
Tsai, TH .
RANDOM STRUCTURES & ALGORITHMS, 2005, 27 (03) :290-309
[2]   ON DISTRIBUTION OF NUMBER OF ADMISSIBLE POINTS IN A VECTOR RANDOM SAMPLE [J].
BARNDORF.O ;
SOBEL, M .
THEORY OF PROBILITY AND ITS APPLICATIONS,USSR, 1966, 11 (02) :249-&
[3]   A random forest guided tour [J].
Biau, Gerard ;
Scornet, Erwan .
TEST, 2016, 25 (02) :197-227
[4]   On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification [J].
Biau, Gerard ;
Devroye, Luc .
JOURNAL OF MULTIVARIATE ANALYSIS, 2010, 101 (10) :2499-2518
[5]  
Biau G, 2010, J MACH LEARN RES, V11, P687
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Bühlmann P, 2002, ANN STAT, V30, P927
[9]  
Caprile B, 2004, LECT NOTES COMPUT SC, V3077, P72
[10]  
Devroye L., 1996, A Probabilistic Theory of Pattern Recognition, V31, DOI 10.1007/978-1-4612-0711-5