Detecting group concept drift from multiple data streams

被引:37
作者
Yu, Hang [1 ]
Liu, Weixu [1 ]
Lu, Jie [2 ]
Wen, Yimin [3 ]
Luo, Xiangfeng [1 ]
Zhang, Guangquan [2 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, 333 Nanchen Rd, Shanghai 200444, Peoples R China
[2] Univ Technol Sydney, Fac Engn & Informat Technol, POB 123, Sydney, NSW 2007, Australia
[3] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, 1, Jinji Rd, Qixing Dist, Guilin 541004, Guangxi, Peoples R China
基金
澳大利亚研究理事会;
关键词
Concept drift; Data streams; Online learning; Hypothesis test; ONLINE;
D O I
10.1016/j.patcog.2022.109113
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Concept drift may lead to a sharp downturn in the performance of streaming in data-based algorithms, caused by unforeseeable changes in the underlying distribution of data. In this paper, we are mainly concerned with concept drift across multiple data streams, and in situations where the drift of each data stream cannot be detected in time, due to slight underlying distribution drifts. We call this group concept drift. When compared to the detection of concept drift for a single data stream, the challenges of detecting group concept drift arise from three aspects: first, the training data become more complex; second, the underlying distribution becomes more complex; and third, the correlations between data streams become more complex. To address these challenges, the key idea of our method is to construct a distribution free test statistic, free from any underlying distribution in multiple data streams. Then, for streaming data, we design an online learning algorithm to obtain this test statistic, thereby determining the concept drift caused by the hypothesis test. The experiment evaluations with both synthetic and realworld datasets prove that our method can accurately detect concept drift from multiple data streams.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 40 条
  • [11] Online and Non-Parametric Drift Detection Methods Based on Hoeffding's Bounds
    Frias-Blanco, Isvani
    del Campo-Avila, Jose
    Ramos-Jimenez, Gonzalo
    Morales-Bueno, Rafael
    Ortiz-Diaz, Agustin
    Caballero-Mota, Yaile
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) : 810 - 823
  • [12] Gama J, 2004, LECT NOTES ARTIF INT, V3171, P286
  • [13] Gama J, 2010, CH CRC DATA MIN KNOW, P1
  • [14] Gama J, 2006, LECT NOTES ARTIF INT, V4093, P42
  • [15] A survey on learning from data streams: current and future trends
    Gama, Joao
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, 2012, 1 (01) : 45 - 55
  • [16] Learning model trees from evolving data streams
    Ikonomovska, Elena
    Gama, Joao
    Dzeroski, Saso
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 23 (01) : 128 - 168
  • [17] Incremental semi-supervised learning on streaming data
    Li, Yanchao
    Wang, Yongli
    Liu, Qi
    Bi, Cheng
    Jiang, Xiaohui
    Sun, Shurong
    [J]. PATTERN RECOGNITION, 2019, 88 : 383 - 396
  • [18] Concept Drift Detection via Equal Intensity k-Means Space Partitioning
    Liu, Anjin
    Lu, Jie
    Zhang, Guangquan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (06) : 3198 - 3211
  • [19] Accumulating regional density dissimilarity for concept drift detection in data streams
    Liu, Anjin
    Lu, Jie
    Liu, Feng
    Zhang, Guangquan
    [J]. PATTERN RECOGNITION, 2018, 76 : 256 - 272
  • [20] Losing V, 2016, IEEE DATA MINING, P291, DOI [10.1109/ICDM.2016.141, 10.1109/ICDM.2016.0040]