Towards convergence rate analysis of random forests for classification

被引:23
作者
Gao W. [1 ]
Xu F. [1 ]
Zhou Z.-H. [1 ]
机构
[1] National Key Laboratory for Novel Software Technology, Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University, Nanjing
基金
中国国家自然科学基金;
关键词
Classification; Consistency; Convergence; Machine learning; Random forests;
D O I
10.1016/j.artint.2022.103788
中图分类号
学科分类号
摘要
Random forests have been one of the successful ensemble algorithms in machine learning, and the basic idea is to construct a large number of random trees individually and make predictions based on an average of their predictions. The great successes have attracted much attention on theoretical understandings of random forests, mostly focusing on regression problems. This work takes one step towards the convergence rates of random forests for classification. We present the first finite-sample rate O(n−1/(8d+2)) on the convergence of purely random forests for binary classification, which can be improved to be of O(n−1/(3.87d+2)) by considering the midpoint splitting mechanism. We introduce another variant of random forests, which follows Breiman's original random forests but with different mechanisms on splitting dimensions and positions. We present the convergence rate O(n−1/(d+2)(ln⁡n)1/(d+2)) for the variant of random forests, which reaches the minimax rate, except for a factor (ln⁡n)1/(d+2), of the optimal plug-in classifier under the L-Lipschitz assumption. We achieve the tighter convergence rate O(ln⁡n/n) under some assumptions over structural data. This work also takes one step towards the convergence rate of random forests for multi-class learning, and presents the same convergence rates of random forests for multi-class learning as that of binary classification, yet with different constants. We finally provide empirical studies to support the theoretical analysis. © 2022 Elsevier B.V.
引用
收藏
相关论文
共 66 条
[1]  
Amaratunga D., Cabrera J., Lee Y.-S., Enriched random forests, Bioinformatics, 24, 18, pp. 2010-2014, (2008)
[2]  
Amit Y., Geman D., Shape quantization and recognition with randomized trees, Neural Comput., 9, 7, pp. 1545-1588, (1997)
[3]  
Arlot S., Genuer R., Analysis of purely random forests bias, (2014)
[4]  
Athey S., Tibshirani J., Wager S., Generalized random forests, Ann. Stat., 47, 2, pp. 1148-1178, (2019)
[5]  
Audibert J.-Y., Tsybakov A., Fast learning rates for plug-in classifiers, Ann. Stat., 35, 2, pp. 608-633, (2007)
[6]  
Basu S., Kumbier K., Brown J., Yu B., Iterative random forests to discover predictive and stable high-order interactions, Proc. Natl. Acad. Sci., 115, 8, pp. 1943-1948, (2018)
[7]  
Belgiu M., Dragut L., Random forest in remote sensing: a review of applications and future directions, ISPRS J. Photogramm. Remote Sens., 114, pp. 24-31, (2016)
[8]  
Biau G., Analysis of a random forests model, J. Mach. Learn. Res., 13, pp. 1063-1095, (2012)
[9]  
Biau G., Devroye L., Lugosi G., Consistency of random forests and other averaging classifiers, J. Mach. Learn. Res., 9, pp. 2015-2033, (2008)
[10]  
Biau G., Scornet E., A random forest guided tour, Test, 25, 2, pp. 197-227, (2016)