Hybrid decision trees for data streams based on Incremental Flexible Naive Bayes prediction at leaf nodes

被引:5
作者
Hemalatha, C. Sweetlin [1 ]
Pathak, Ravi [2 ]
Vaidehi, V. [1 ]
机构
[1] VIT, Sch Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] Secretariat Copenhagen, Global Biodivers Informat Facil, Copenhagen, Denmark
关键词
Data stream mining; Decision trees; Hoeffding bound; Kernel density estimation; Incremental Flexible Naive Bayes; MINING DATA STREAMS;
D O I
10.1007/s12065-019-00252-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining data over streams in one pass and using constant memory is a challenging task. Decision trees are one of the most popular classifiers for both batch and incremental learning due to their high degree of interpretability, ease of construction and good accuracy. The most popular decision tree for stream classification is Hoeffding Tree based on Hoeffding bound. Literature shows a few variants of decision trees based on different bounds. The default class prediction method adopted in decision tree is "majority class" approach. Later, the accuracy of prediction was scaled up by a hybrid decision tree where Naive Bayes classifier was used for prediction. Kernel Density Estimation (KDE) is employed in Flexible Naive Bayes for classification. However, it is suitable for modeling static data set. This paper proposes an Incremental Flexible Naive Bayes (IFNB) based hybrid decision tree paradigm that uses KDE to model continuous attributes at leaf nodes of the tree for improving the class prediction accuracy. Experimental results on both synthetic and real dataset show that the proposed IFNB based leaf classifiers achieves improvement over the class prediction methods adopted in existing decision trees for data streams.
引用
收藏
页码:515 / 526
页数:12
相关论文
共 39 条
  • [1] Aggarwal Charu C, 2007, Data Streams: Models and Algorithms, V31
  • [2] [Anonymous], P 13 INT C DISC SCI
  • [3] Bifet A., 2009, TECHNICAL REPORT
  • [4] Bifet A, 2011, LECT NOTES ARTIF INT, V6913, P617, DOI 10.1007/978-3-642-23808-6_41
  • [5] Breiman L., 1984, BIOMETRICS, V1st ed.
  • [6] Cazzolato MT, 2013, COMP MED SY, P389, DOI 10.1109/CBMS.2013.6627823
  • [7] Ensemble classifier for mining data streams
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 397 - 406
  • [8] A general framework for mining massive data streams
    Domingos, P
    Hulten, G
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (04) : 945 - 949
  • [9] Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
  • [10] Frank E., 2005, Data Mining: Practical Machine Learning Tools and Techniques