Gradient boosted trees for evolving data streams

被引:0
作者
Nuwan Gunasekara
Bernhard Pfahringer
Heitor Gomes
Albert Bifet
机构
[1] University of Waikato,AI Institute
[2] Victoria University of Wellington,LTCI, Télécom Paris
[3] IP Paris,undefined
来源
Machine Learning | 2024年 / 113卷
关键词
Gradient boosting; Stream learning; Gradient boosted trees; Concept drift;
D O I
暂无
中图分类号
学科分类号
摘要
Gradient Boosting is a widely-used machine learning technique that has proven highly effective in batch learning. However, its effectiveness in stream learning contexts lags behind bagging-based ensemble methods, which currently dominate the field. One reason for this discrepancy is the challenge of adapting the booster to new concept following a concept drift. Resetting the entire booster can lead to significant performance degradation as it struggles to learn the new concept. Resetting only some parts of the booster can be more effective, but identifying which parts to reset is difficult, given that each boosting step builds on the previous prediction. To overcome these difficulties, we propose Streaming Gradient Boosted Trees (Sgbt), which is trained using weighted squared loss elicited in XGBoost. Sgbt exploits trees with a replacement strategy to detect and recover from drifts, thus enabling the ensemble to adapt without sacrificing the predictive performance. Our empirical evaluation of Sgbt on a range of streaming datasets with challenging drift scenarios demonstrates that it outperforms current state-of-the-art methods for evolving data streams.
引用
收藏
页码:3325 / 3352
页数:27
相关论文
共 33 条
  • [1] Freund Y(1997)A decision-theoretic generalization of on-line learning and an application to boosting Journal of computer and system sciences 55 119-139
  • [2] Schapire RE(2001)Greedy function approximation: A gradient boosting machine Annals of statistics 29 1189-1232
  • [3] Friedman JH(2002)Stochastic gradient boosting Computational Statistics & Data Analysis 38 367-378
  • [4] Friedman JH(2000)Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors) The Annals of Statistics 28 337-407
  • [5] Friedman J(2017)A survey on ensemble learning for data stream classification ACM Computing Surveys (CSUR) 50 1-36
  • [6] Hastie T(2022)Elastic gradient boosting decision tree with adaptive iterations for concept drift adaptation Neurocomputing 491 288-304
  • [7] Tibshirani R(2017)Adaptive random forests for evolving data stream classification Machine Learning 106 1469-1495
  • [8] Gomes HM(2003)Smooth boosting and learning with malicious noise The Journal of Machine Learning Research 4 633-648
  • [9] Barddal JP(2011)Learning model trees from evolving data streams Data Mining and Knowledge Discovery 23 128-168
  • [10] Enembreck F(2016)Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights Computational Statistics 31 1305-1325