On the reliable detection of concept drift from streaming unlabeled data

被引：106

作者：

Sethi, Tegjyot Singh ^{[1
]}

Kantardzic, Mehmed ^{[1
]}

机构：

[1] Univ Louisville, Data Min Lab, Louisville, KY 40292 USA

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2017年 / 82卷

关键词：

Concept drift; Streaming data; Unlabeled; Margin density; Ensemble; Cybersecurity; CLASSIFIERS; ENSEMBLES; MARGIN;

D O I：

10.1016/j.eswa.2017.04.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. At the same time, it produces performance comparable to that of a fully labeled drift detector. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability. (C) 2017 Elsevier Ltd. All rights reserved.

引用

页码：77 / 99

页数：23

共 50 条

[1] Fast concept drift detection using unlabeled data
Shang, Dan
Zhang, Guangquan
Lu, Jie
DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 133 - 140
[2] No Free Lunch Theorem for concept drift detection in streaming data classification: A review
Hu, Hanqing
Kantardzic, Mehmed
Sethi, Tegjyot S.
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (02)
[3] An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data
Mansour, Romany F.
Al-Otaibi, Shaha
Al-Rasheed, Amal
Aljuaid, Hanan
Pustokhina, Irina, V
Pustokhin, Denis A.
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (03): : 2843 - 2858
[4] Incremental Learning of Concept Drift from Streaming Imbalanced Data
Ditzler, Gregory
Polikar, Robi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (10) : 2283 - 2301
[5] Handling adversarial concept drift in streaming data
Sethi, Tegjyot Singh
Kantardzic, Mehmed
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 97 : 18 - 40
[6] Concept Drift Detection on Streaming Data under Limited Labeling
Kim, Young In
Park, Cheong Hee
2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 273 - 280
[7] Streaming Data Classification with Concept Drift
Althabiti, Mashail
Abdullah, Manal
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2019, 12 (01): : 177 - 184
[8] Ensemble framework for concept-drift detection in multidimensional streaming data
Prasad K.S.N.
Rao A.S.
Ramana A.V.
International Journal of Computers and Applications, 2022, 44 (12) : 1193 - 1200
[9] Learning from streaming data with concept drift and imbalance: an overview
Hoens, T. Ryan
Polikar, Robi
Chawla, Nitesh V.
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2012, 1 (01) : 89 - 101
[10] Learning from streaming data with concept drift and imbalance: an overview
T. Ryan Hoens
Robi Polikar
Nitesh V. Chawla
Progress in Artificial Intelligence, 2012, 1 (1) : 89 - 101

← 1 2 3 4 5 →