An adaptive XGBoost-based optimized sliding window for concept drift handling in non-stationary spatiotemporal data streams classifications

被引:3
|
作者
Angbera, Ature [1 ,2 ]
Chan, Huah Yong [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Minden 11800, Pulau Pinang, Malaysia
[2] Joseph Sarwuan Tarka Univ, Dept Comp Sci, Makurdi, Nigeria
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 06期
关键词
Concept drift; Machine learning; Sliding windows; Spatiotemporal data streams; Bayesian optimization;
D O I
10.1007/s11227-023-05729-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the popularity of using data science for decision-making has grown significantly. This rise in popularity has led to a significant learning challenge known as concept drifting, primarily due to the increasing use of spatial and temporal data streaming applications. Concept drift can have highly negative consequences, leading to the degradation of models used in these applications. A new model called BOASWIN-XGBoost (Bayesian Optimized Adaptive Sliding Window and XGBoost) has been introduced in this work to handle concept drift. This model is designed explicitly for classifying streaming data and comprises three main procedures: pre-processing, concept drift detection, and classification. The BOASWIN-XGBoost model utilizes a method called Bayesian-Optimized Adaptive Sliding Window (BOASWIN) to identify the presence of concept drift in the streaming data. Additionally, it employs an optimized XGBoost (eXtreme Gradient Boosting) model for classification purposes. The hyperparameter tuning approach known as BO-TPE (Bayesian Optimization with Tree-structured Parzen Estimator) is employed to fine-tune the XGBoost model's parameters, thus enhancing the classifier's performance. Seven streaming datasets were used to evaluate the proposed approach's performance, including Agrawal_a, Agrawal_g, SEA_a, SEA_g, Hyperplane, Phishing, and Weather. The simulation results demonstrate that the suggested model achieves impressive accuracy values of 70.83%, 71.02%, 76.76%, 76.96%, 84.26%, 95.53%, and 78.35% on the corresponding datasets, affirming its superior performance in handling concept drift and classifying streaming data.
引用
收藏
页码:7781 / 7811
页数:31
相关论文
共 43 条
  • [41] AMANDA: Semi-supervised density-based adaptive model for non-stationary data with extreme verification latency
    Ferreira, Raul S.
    Zimbrao, Geraldo
    Alvim, Leandro G. M.
    INFORMATION SCIENCES, 2019, 488 : 219 - 237
  • [42] Method for measuring non-stationary motion attitude based on MEMS-IMU array data fusion and adaptive filtering
    Lan, Jianping
    Wang, Kaixuan
    Song, Sujing
    Li, Kunpeng
    Liu, Cheng
    He, Xiaowei
    Hou, Yuqing
    Tang, Sheng
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (08)
  • [43] A Time-Frequency Domain Detection Method for Measurement Data of Non-Stationary Signals Based on Optimized Hilbert-Huang Transform
    Zhu, Caiyun
    Cao, Tianyu
    Zhao, Xiaoqun
    Yang, Yichen
    Xu, Zhongwei
    IEEE INSTRUMENTATION & MEASUREMENT MAGAZINE, 2023, 26 (02) : 29 - 39