On the reliable detection of concept drift from streaming unlabeled data

被引:107
作者
Sethi, Tegjyot Singh [1 ]
Kantardzic, Mehmed [1 ]
机构
[1] Univ Louisville, Data Min Lab, Louisville, KY 40292 USA
关键词
Concept drift; Streaming data; Unlabeled; Margin density; Ensemble; Cybersecurity; CLASSIFIERS; ENSEMBLES; MARGIN;
D O I
10.1016/j.eswa.2017.04.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. At the same time, it produces performance comparable to that of a fully labeled drift detector. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:77 / 99
页数:23
相关论文
共 50 条
  • [31] Learning from concept drifting data streams with unlabeled data
    Wu, Xindong
    Li, Peipei
    Hu, Xuegang
    NEUROCOMPUTING, 2012, 92 : 145 - 155
  • [32] Reconstruction-based unsupervised drift detection over multivariate streaming data
    Kaminskyi, Daniil
    Li, Bin
    Mueller, Emmanuel
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 807 - 813
  • [33] Streaming Data Classification Based on Hierarchical Concept Drift and Online Ensemble
    Liu, Ning
    Zhao, Jianhua
    IEEE ACCESS, 2023, 11 : 126040 - 126051
  • [34] Concept Drift Detection for Evolving Stream Data
    Lee, Jeonghoon
    Lee, Yoon-Joon
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11) : 2288 - 2292
  • [35] Forgetful Forests: Data Structures for Machine Learning on Streaming Data under Concept Drift
    Yuan, Zhehu
    Sun, Yinqi
    Shasha, Dennis
    ALGORITHMS, 2023, 16 (06)
  • [36] Monitoring Classification Blindspots to Detect Drifts from Unlabeled Data
    Sethi, Tegjyot Singh
    Kantardzic, Mehmed
    Arabmakki, Elaheh
    PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 142 - 151
  • [37] Diversity measure as a new drift detection method in data streaming
    Mahdi, Osama A.
    Pardede, Eric
    Ali, Nawfal
    Cao, Jinli
    KNOWLEDGE-BASED SYSTEMS, 2020, 191
  • [38] Handling Concept Drift in Data Streams by Using Drift Detection Methods
    Patil, Malini M.
    DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2018, VOL 2, 2019, 839 : 155 - 166
  • [39] Accumulating regional density dissimilarity for concept drift detection in data streams
    Liu, Anjin
    Lu, Jie
    Liu, Feng
    Zhang, Guangquan
    PATTERN RECOGNITION, 2018, 76 : 256 - 272
  • [40] Concept drift detection and accelerated convergence of online learning
    Guo, Husheng
    Li, Hai
    Sun, Ni
    Ren, Qiaoyan
    Zhang, Aijuan
    Wang, Wenjian
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (03) : 1005 - 1043