Analyzing and repairing concept drift adaptation in data stream classification

被引:0
作者
Ben Halstead
Yun Sing Koh
Patricia Riddle
Russel Pears
Mykola Pechenizkiy
Albert Bifet
Gustavo Olivares
Guy Coulson
机构
[1] The University of Auckland,School of Computer Science
[2] Auckland University of Technology,undefined
[3] Eindhoven University of Technology,undefined
[4] University of Waikato,undefined
[5] LTCI,undefined
[6] Télécom Paris,undefined
[7] IP-Paris,undefined
[8] National Institute of Water and Atmospheric Research,undefined
来源
Machine Learning | 2022年 / 111卷
关键词
Concept drift; Data stream classification; Recurring concepts;
D O I
暂无
中图分类号
学科分类号
摘要
Data collected over time often exhibit changes in distribution, or concept drift, caused by changes in factors relevant to the classification task, e.g. weather conditions. Incorporating all relevant factors into the model may be able to capture these changes, however, this is usually not practical. Data stream based methods, which instead explicitly detect concept drift, have been shown to retain performance under unknown changing conditions. These methods adapt to concept drift by training a model to classify each distinct data distribution. However, we hypothesize that existing methods do not robustly handle real-world tasks, leading to adaptation errors where context is misidentified. Adaptation errors may cause a system to use a model which does not fit the current data, reducing performance. We propose a novel repair algorithm to identify and correct errors in concept drift adaptation. Evaluation on synthetic data shows that our proposed AiRStream system has higher performance than baseline methods, while is also better at capturing the dynamics of the stream. Evaluation on an air quality inference task shows AiRStream provides increased real-world performance compared to eight baseline methods. A case study shows that AiRStream is able to build a robust model of environmental conditions over this task, allowing the adaptions made to concept drift to be analysed and related to changes in weather. We discovered a strong predictive link between the adaptions made by AiRStream and changes in meteorological conditions.
引用
收藏
页码:3489 / 3523
页数:34
相关论文
共 64 条
  • [1] Alippi C(2013)Just-in-time classifiers for recurrent concepts IEEE Transactions on Neural Networks and Learning Systems 24 620-634
  • [2] Boracchi G(2006)Early drift detection method Fourth International Workshop on Knowledge Discovery from Data Streams 6 77-86
  • [3] Roveri M(2018)Dynamic classifier selection: Recent advances and perspectives Information Fusion 41 195-216
  • [4] Baena-Garcıa M(2006)Statistical comparisons of classifiers over multiple data sets Journal of Machine Learning Research 7 1-30
  • [5] del Campo-Ávila J(2014)Online and non-parametric drift detection methods based on Hoeffding’s bounds IEEE Transactions on Knowledge and Data Engineering 27 810-823
  • [6] Fidalgo R(2014)Recurrent concepts in data streams classification Knowledge and Information Systems 40 489-507
  • [7] Bifet A(2014)A survey on concept drift adaptation ACM Computing Surveys (CSUR) 46 1-37
  • [8] Gavalda R(2017)Adaptive random forests for evolving data stream classification Machine Learning 106 1469-1495
  • [9] Morales-Bueno R(2013)RCD: A recurring concept drift framework Pattern Recognition Letters 34 1018-1025
  • [10] Cruz RM(2008)From dynamic classifier selection to dynamic ensemble selection Pattern Recognition 41 1718-1731