Regularized and incremental decision trees for data streams

被引:1
作者
Barddal, Jean Paul [1 ]
Enembreck, Fabricio [1 ]
机构
[1] Pontificia Univ Catolica Parana, Grad Program Informat PPGIa, Curitiba, Parana, Brazil
关键词
Data stream mining; Classification; Decision tree; Random forest; Regularization; SELECTION; DRIFT;
D O I
10.1007/s12243-020-00782-3
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Decision trees are a widely used family of methods for learning predictive models from both batch and streaming data. Despite depicting positive results in a multitude of applications, incremental decision trees continuously grow in terms of nodes as new data becomes available, i.e., they eventually split on all features available, and also multiple times using the same feature, thus leading to unnecessary complexity and overfitting. With this behavior, incremental trees lose the ability to generalize well, be human-understandable, and be computationally efficient. To tackle these issues, we proposed in a previous study a regularization scheme for Hoeffding decision trees that (i) uses a penalty factor to control the gain obtained by creating a new split node using a feature that has not been used thus far and (ii) uses information from previous splits in the current branch to determine whether the gain observed indeed justifies a new split. In this paper, we extend this analysis and apply the proposed regularization scheme to other types of incremental decision trees and report the results in both synthetic and real-world scenarios. The main interest is to verify whether and how the proposed regularization scheme affects the different types of incremental trees. Results show that in addition to the original Hoeffding Tree, the Adaptive Random Forest also benefits from regularization, yet, McDiarmid Trees and Extremely Fast Decision Trees observe declines in accuracy.
引用
收藏
页码:493 / 503
页数:11
相关论文
共 36 条
  • [1] DATABASE MINING - A PERFORMANCE PERSPECTIVE
    AGRAWAL, R
    IMIELINSKI, T
    SWAMI, A
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) : 914 - 925
  • [2] Amezzane Ilham, 2019, Innovations in Smart Cities Applications Edition 2. Proceedings of the Third International Conference on Smart City Applications. Lecture Notes in Intelligent Transportation and Infrastructure (LNITI), P557, DOI 10.1007/978-3-030-11196-0_47
  • [3] [Anonymous], P INT C TOOLS ART IN
  • [4] Bahri M, 2018, IEEE INT CONF BIG DA, P604, DOI 10.1109/BigData.2018.8622178
  • [5] Learning Regularized Hoeffding Trees from Data Streams
    Barddal, Jean Paul
    Enembreck, Fabricio
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 574 - 581
  • [6] Barddal JP, 2016, LECT NOTES COMPUTER
  • [7] Ensembles of Restricted Hoeffding Trees
    Bifet, Albert
    Frank, Eibe
    Holmes, Geoff
    Pfahringer, Bernhard
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [8] Bifet A, 2010, J MACH LEARN RES, V11, P1601
  • [9] Bifet A, 2009, LECT NOTES COMPUT SC, V5772, P249, DOI 10.1007/978-3-642-03915-7_22
  • [10] Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables
    Blackard, JA
    Dean, DJ
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) : 131 - 151