An adaptive XGBoost-based optimized sliding window for concept drift handling in non-stationary spatiotemporal data streams classifications

被引:3
|
作者
Angbera, Ature [1 ,2 ]
Chan, Huah Yong [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Minden 11800, Pulau Pinang, Malaysia
[2] Joseph Sarwuan Tarka Univ, Dept Comp Sci, Makurdi, Nigeria
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 06期
关键词
Concept drift; Machine learning; Sliding windows; Spatiotemporal data streams; Bayesian optimization;
D O I
10.1007/s11227-023-05729-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the popularity of using data science for decision-making has grown significantly. This rise in popularity has led to a significant learning challenge known as concept drifting, primarily due to the increasing use of spatial and temporal data streaming applications. Concept drift can have highly negative consequences, leading to the degradation of models used in these applications. A new model called BOASWIN-XGBoost (Bayesian Optimized Adaptive Sliding Window and XGBoost) has been introduced in this work to handle concept drift. This model is designed explicitly for classifying streaming data and comprises three main procedures: pre-processing, concept drift detection, and classification. The BOASWIN-XGBoost model utilizes a method called Bayesian-Optimized Adaptive Sliding Window (BOASWIN) to identify the presence of concept drift in the streaming data. Additionally, it employs an optimized XGBoost (eXtreme Gradient Boosting) model for classification purposes. The hyperparameter tuning approach known as BO-TPE (Bayesian Optimization with Tree-structured Parzen Estimator) is employed to fine-tune the XGBoost model's parameters, thus enhancing the classifier's performance. Seven streaming datasets were used to evaluate the proposed approach's performance, including Agrawal_a, Agrawal_g, SEA_a, SEA_g, Hyperplane, Phishing, and Weather. The simulation results demonstrate that the suggested model achieves impressive accuracy values of 70.83%, 71.02%, 76.76%, 76.96%, 84.26%, 95.53%, and 78.35% on the corresponding datasets, affirming its superior performance in handling concept drift and classifying streaming data.
引用
收藏
页码:7781 / 7811
页数:31
相关论文
共 43 条
  • [31] Finding Multi-Density Clusters in Non-Stationary Data Streams Using an Ant Colony with Adaptive Parameters
    Fahy, Conor
    Yang, Shengxiang
    Gongora, Mario
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 673 - 680
  • [32] An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams
    Li, Chao-Wei
    Jea, Kuen-Fang
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13386 - 13404
  • [33] Anomaly Detection in Non-stationary Data: Ensemble based Self-Adaptive OCSVM
    Ghafoori, Zahra
    Erfani, Sarah M.
    Rajasegarar, Sutharshan
    Karunasekera, Shanika
    Leckie, Christopher A.
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2476 - 2483
  • [34] An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams
    Hosseini, Mohammad Javad
    Gholipour, Ameneh
    Beigy, Hamid
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 46 (03) : 567 - 597
  • [35] An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams
    Mohammad Javad Hosseini
    Ameneh Gholipour
    Hamid Beigy
    Knowledge and Information Systems, 2016, 46 : 567 - 597
  • [36] A Sliding-window Based Adaptive Approximating Method to Discover Recent Frequent Itemsets from Data Streams
    Jea, Kuen-Fang
    Li, Chao-Wei
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 532 - 539
  • [37] An ensemble-based semi-supervised learning approach for non-stationary imbalanced data streams with label scarcity
    Abdi, Yousef
    Asadpour, Mohammad
    Feizi-Derakhshi, Mohammad-Reza
    APPLIED SOFT COMPUTING, 2024, 167
  • [38] CDA-PDDWE: Concept Drift-Aware Performance-Based Diversified Dynamic Weighted Ensemble for Non-stationary Environments
    Suryawanshi, Shubhangi
    Goswami, Anurag
    Patil, Pramod
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (09) : 12989 - 13004
  • [39] A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams
    Khezri, Shirin
    Tanha, Jafar
    Ahmadi, Ali
    Sharifi, Arash
    Neurocomputing, 2021, 442 : 125 - 145
  • [40] A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams
    Khezri, Shirin
    Tanha, Jafar
    Ahmadi, Ali
    Sharifi, Arash
    NEUROCOMPUTING, 2021, 442 : 125 - 145