On the reliable detection of concept drift from streaming unlabeled data

被引:107
作者
Sethi, Tegjyot Singh [1 ]
Kantardzic, Mehmed [1 ]
机构
[1] Univ Louisville, Data Min Lab, Louisville, KY 40292 USA
关键词
Concept drift; Streaming data; Unlabeled; Margin density; Ensemble; Cybersecurity; CLASSIFIERS; ENSEMBLES; MARGIN;
D O I
10.1016/j.eswa.2017.04.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. At the same time, it produces performance comparable to that of a fully labeled drift detector. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:77 / 99
页数:23
相关论文
共 50 条
  • [41] Concept Drift Detection from Multi-Class Imbalanced Data Streams
    Korycki, Lukasz
    Krawczyk, Bartosz
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1068 - 1079
  • [42] Concept drift detection on stream data for revising DBSCAN
    Miyata, Yasushi
    Ishikawa, Hiroshi
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2021, 104 (01) : 87 - 94
  • [43] Concept drift detection on stream data for revising DBSCAN
    Miyata Y.
    Ishikawa H.
    IEEJ Transactions on Electronics, Information and Systems, 2020, 140 (08) : 949 - 955
  • [44] Concept Drift Detection with Denoising Autoencoder in Incomplete Data
    Murao, Jun
    Yonekawa, Kei
    Kurokawa, Mori
    Amagata, Daichi
    Maekawa, Takuya
    Hara, Takahiro
    MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES, 2022, 419 : 541 - 552
  • [45] Bhattacharyya distance based concept drift detection method for evolving data stream
    Baidari, Ishwar
    Honnikoll, Nagaraj
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [46] Mining Recurring Concept Drifts with Limited Labeled Streaming Data
    Li, Peipei
    Wu, Xindong
    Hu, Xuegang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [47] A Lightweight Concept Drift Detection Ensemble
    Maciel, Bruno I. F.
    Santos, Silas G. T. C.
    Barros, Roberto S. M.
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 1061 - 1068
  • [48] Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data
    Priya, S.
    Uthra, R. Annie
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 3499 - 3515
  • [49] Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data
    S. Priya
    R. Annie Uthra
    Complex & Intelligent Systems, 2023, 9 : 3499 - 3515
  • [50] Classification of concept drift data streams
    Padmalatha, E.
    Reddy, C. R. K.
    Rani, B. Padmaja
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,