On the Capability of Classification Trees and Random Forests to Estimate Probabilities

被引:1
作者
Plante, Jean-Francois [1 ]
Radatz, Marisa [1 ]
机构
[1] HEC Montreal, Dept Decis Sci, 3000 Chemin Cote Sainte Catherine, Montreal, PQ H3T 2A7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Probability estimates; Binary classification; Decision trees; Random forests; Logistic regression; Consistency; Monte Carlo simulations; PROPENSITY SCORE ESTIMATION; CALIBRATION;
D O I
10.1007/s42519-024-00376-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
With the rising popularity of artificial intelligence, machine learning algorithms are being considered for an increasing number of problems. For binary classification, most algorithms can provide an estimate of the probability that an event will occur, but the statistical properties thereof are often unknown. After reviewing convergence results for classification trees and random forests in the literature, we discuss how some methods could be negatively impacted by poor probability estimates. We design an extensive Monte Carlo simulation inspired by nine datasets to evaluate the ability of different algorithms to estimate probabilities. We find that while trees and forests may perform better at ranking, their ability to estimate probabilities rarely exceeds that of logistic regression, even when the logistic regression is misspecified.
引用
收藏
页数:22
相关论文
共 32 条
[1]  
[Anonymous], 2020, ADV NEUR IN
[2]  
Biau G, 2012, J MACH LEARN RES, V13, P1063
[3]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Breiman L., 1996, OUT BAG ESTIMATION
[7]  
Breiman L., 1984, Classification and regression trees, DOI [10.1201/9781315139470, DOI 10.1201/9781315139470]
[8]  
Buskirk T.D., 2015, Survey methods: insights from the field, weighting: practical issues and 'How to' approach, V1, P17, DOI [10.13094/SMIF-2015-00003, DOI 10.13094/SMIF-2015-00003]
[9]  
Chawla NV., 2006, EVALUATING PROBABILI
[10]  
DEVROYE L, 1996, PROBABILISTIC THEORY