Online learning for data streams with bi-dynamic distributions

被引:1
作者
Yan, Huigui [1 ,2 ]
Liu, Jiale [1 ,2 ]
Xiao, Jiawei [1 ,2 ]
Niu, Shina [1 ,3 ]
Dong, Siqi [1 ,2 ]
You, Dianlong [1 ,2 ]
Shen, Limin [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao 066004, Hebei, Peoples R China
[2] Yanshan Univ, Key Lab Software Engn Hebei Prov, Qinhuangdao 066004, Hebei, Peoples R China
[3] Henan Inst Technol, Sch Comp Sci & Technol, Xinxiang 453004, Henan, Peoples R China
基金
中国国家自然科学基金;
关键词
Online learning; Data streams; Dynamic feature spaces; Dynamic data distribution;
D O I
10.1016/j.ins.2024.120796
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data streams, as an important pattern of big data, require online real-time processing because instances arrive one by one and are fleeting. Existing online learning methods make distinctive assumptions, such as a fixed feature space, a varying feature space that follows specific patterns, and a fixed data distribution. However, data streams generated from real-world scenarios typically have both randomly changing feature spaces and data distributions, making existing methods inappropriate for practical applications. To fill this gap, this study proposes a novel O nline L earning for Data Streams with B i-dynamic D istributions (OLBD) algorithm. OLBD has a two-fold main idea: 1) it overcomes random changes in the feature space by building a mapping matrix to space transform and projects the original instances onto the global feature space; 2) it handles dynamic data distributions by constraining prior knowledge and transferring established mapping relationships to new distributions. To evaluate OLBD, we compared it with related state-of-theart algorithms. First, we use 13 datasets to simulate three scenarios of dynamic feature space, namely trapezoidal, feature evolvable, and capricious data streams. Second, we simulated the data streams with dynamic data distributions using eight real and four generated datasets. We then conducted ablation studies on the parameter alpha. Finally, we analyzed data streams with bidynamic data distributions under different feature missing ratios and verified the generalization. The results show that OLBD significantly outperforms its rivals. Additionally, a practical case study on movie review classification was conducted to illustrate the effectiveness of OLBD in real-world scenarios.
引用
收藏
页数:21
相关论文
共 42 条
[1]   RDDM: Reactive drift detection method [J].
Barros, Roberto S. M. ;
Cabral, Danilo R. L. ;
Goncalves, Paulo M., Jr. ;
Santos, Silas G. T. C. .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 90 :344-355
[2]   From concept drift to model degradation: An overview on performance-aware drift detectors [J].
Bayram, Firas ;
Ahmed, Bestoun S. ;
Kassler, Andreas .
KNOWLEDGE-BASED SYSTEMS, 2022, 245
[3]  
Beyazit E, 2019, AAAI CONF ARTIF INTE, P3232
[4]  
Bifet A, 2010, JMLR WORKSH CONF PRO, V11, P44
[5]  
Boyd S, 2004, CONVEX OPTIMIZATION, DOI 10.1017/CBO9780511804441
[6]   The impact of data difficulty factors on classification of imbalanced and concept drifting data streams [J].
Brzezinski, Dariusz ;
Minku, Leandro L. ;
Pewinski, Tomasz ;
Stefanowski, Jerzy ;
Szumaczuk, Artur .
KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) :1429-1469
[7]   Denoising Aggregation of Graph Neural Networks by Using Principal Component Analysis [J].
Dong, Wei ;
Wozniak, Marcin ;
Wu, Junsheng ;
Li, Weigang ;
Bai, Zongwen .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (03) :2385-2394
[8]  
Gama J, 2004, LECT NOTES ARTIF INT, V3171, P286
[9]   Unsupervised concept drift detection for multi-label data streams [J].
Gulcan, Ege Berkay ;
Can, Fazli .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (03) :2401-2434
[10]   Concept drift type identification based on multi-sliding windows [J].
Guo, Husheng ;
Li, Hai ;
Ren, Qiaoyan ;
Wang, Wenjian .
INFORMATION SCIENCES, 2022, 585 :1-23