Delta Boosting Machine with Application to General Insurance

被引:33
作者
Lee, Simon C. K. [1 ]
Lin, Sheldon [2 ]
机构
[1] Univ Hong Kong, Dept Stat & Actuarial Sci, Pokfulam, Hong Kong, Hong Kong, Peoples R China
[2] Univ Toronto, Dept Stat Sci2, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1080/10920277.2018.1431131
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
In this article, we introduce Delta Boosting (DB) as a new member of the boosting family. Similar to the popular Gradient Boosting (GB), this new member is presented as a forward stagewise additive model that attempts to reduce the loss at each iteration by sequentially fitting a simple base learner to complement the running predictions. Instead of relying on the negative gradient, as is the case for GB, DB adopts a new measure called delta as the basis. Delta is defined as the loss minimizer at an observation level. We also show that DB is the optimal boosting member for a wide range of loss functions. The optimality is a consequence of DB solving for the split and adjustment simultaneously to maximize loss reduction at each iteration. In addition, we introduce an asymptotic version of DB that works well for all twice-differentiable strictly convex loss functions. This asymptotic behavior does not depend on the number of observations, but rather on a high number of iterations that can be augmented through common regularization techniques. We show that the basis in the asymptotic extension differs from the basis in GB only by a multiple of the second derivative of the log-likelihood. The multiple is considered to be a correction factor, one that corrects the bias toward the observations with high second derivatives in GB. When negative log-likelihood is used as the loss function, this correction can be interpreted as a credibility adjustment for the process variance. Simulation studies and real data application we conducted suggest that DB is a significant improvement over GB. The performance of the asymptotic version is less dramatic, but the improvement is still compelling. Like GB, DB provides a high transparency to users, and we can review the marginal influence of variables through relative importance charts and the partial dependence plots. We can also assess the overall model performance through evaluating the losses, lifts, and double lifts on the holdout sample.
引用
收藏
页码:405 / 425
页数:21
相关论文
共 27 条
[1]  
andWatanabe O, 2000, P 13 ANN C COMP LEAR
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Statistical modeling: The two cultures [J].
Breiman, L .
STATISTICAL SCIENCE, 2001, 16 (03) :199-215
[4]  
Breiman L, 1998, ANN STAT, V26, P801
[5]  
Breiman L., 1984, CLASSIFICATION REGRE, DOI DOI 10.1201/9781315139470
[6]   Boosting algorithms: Regularization, prediction and model fitting [J].
Buehlmann, Peter ;
Hothorn, Torsten .
STATISTICAL SCIENCE, 2007, 22 (04) :477-505
[7]  
Chen T, 2016, XGBOOST DOCUMENTATIO
[8]   BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY [J].
FREUND, Y .
INFORMATION AND COMPUTATION, 1995, 121 (02) :256-285
[9]   An adaptive version of the boost by majority algorithm [J].
Freund, Y .
MACHINE LEARNING, 2001, 43 (03) :293-318
[10]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139