Probabilistic exact adaptive random forest for recurrent concepts in data streams

被引:5
|
作者
Wu, Ocean [1 ]
Koh, Yun Sing [1 ]
Dobbie, Gillian [1 ]
Lacombe, Thomas [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
关键词
Random forest; Recurring concepts; Concept drift; Data stream; CONCEPT DRIFTS;
D O I
10.1007/s41060-021-00273-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to adapt random forests to the dynamic nature of data streams, the state-of-the-art technique discards trained trees and grows new trees when concept drifts are detected. This is particularly wasteful when recurrent patterns exist. In this work, we introduce a novel framework called PEARL, which uses both an exact technique and a probabilistic graphical model with Lossy Counting, to replace drifted trees with relevant trees built in the past. The exact technique utilizes pattern matching to find the set of drifted trees that co-occurred in predictions in the past. Meanwhile, a probabilistic graphical model is being built to capture the tree replacements among recurrent concept drifts. Once the graphical model becomes stable, it replaces the exact technique and finds relevant trees in a probabilistic fashion. Further, Lossy Counting is applied to the graphical model which brings an added theoretical guarantee for both error rate and space complexity. We empirically show our technique outperforms baselines in terms of accuracy and kappa on both synthetic and real-world datasets.
引用
收藏
页码:17 / 32
页数:16
相关论文
共 50 条
  • [31] Adaptive online incremental learning for evolving data streams
    Zhang, Si -si
    Liu, Jian-wei
    Zuo, Xin
    APPLIED SOFT COMPUTING, 2021, 105
  • [32] A Probabilistic Sample Matchmaking Strategy for Imbalanced Data Streams with Concept Drift
    Lobo, Jesus L.
    Del Ser, Javier
    Bilbao, Miren Nekane
    Lana, Ibai
    Salcedo-Sanz, S.
    INTELLIGENT DISTRIBUTED COMPUTING X, 2017, 678 : 237 - 246
  • [33] Data-driven multinomial random forest: a new random forest variant with strong consistency
    JunHao Chen
    XueLi Wang
    Fei Lei
    Journal of Big Data, 11
  • [34] Data-driven multinomial random forest: a new random forest variant with strong consistency
    Chen, Junhao
    Wang, Xueli
    Lei, Fei
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [35] SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
    Karlstrom, Jacob
    Aine, Mattias
    Staaf, Johan
    Veerla, Srinivas
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 1567 - 1579
  • [36] Random sampling algorithms for landmark windows over data streams
    Zhang Longbo
    Li Zhanhuai
    Yu Min
    Wang Yong
    Jiang Yun
    ICEIS 2006: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATIONAL SYSTEMS: DATABASES AND INFORMATION SYSTEMS INTEGRATION, 2006, : 103 - +
  • [37] Adaptive stochastic configuration network based on online active learning for evolving data streams
    Guo, Yinan
    Pu, Jiayang
    He, Jiale
    Jiao, Botao
    Ji, Jianjiao
    Yang, Shengxiang
    INFORMATION SCIENCES, 2025, 711
  • [38] Random forest with Random projection to impute missing gene expression data
    Gondara, Lovedeep
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 1251 - 1256
  • [39] A random forest and adaptive stochastic ranking based evolutionary algorithm
    Tian, Jia-Xin
    Li, Yan
    Zhang, Wei
    Liu, Yuan-Chao
    Liu, Jian-Chang
    Kongzhi yu Juece/Control and Decision, 2024, 39 (11): : 3781 - 3790
  • [40] ECG Biometric Identification Using Wavelet Analysis Coupled with Probabilistic Random Forest
    Tan, Robin
    Perkowski, Marek
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 182 - 187