Efficient unsupervised drift detector for fast and high-dimensional data streams

被引:0
作者
Vinicius M. A. Souza
Antonio R. S. Parmezan
Farhan A. Chowdhury
Abdullah Mueen
机构
[1] Pontifícia Universidade Católica do Paraná,
[2] University of New Mexico,undefined
[3] University of São Paulo,undefined
来源
Knowledge and Information Systems | 2021年 / 63卷
关键词
Data stream; Concept drift; Unsupervised drift detector;
D O I
暂无
中图分类号
学科分类号
摘要
Stream mining considers the online arrival of examples at high speed and the possibility of changes in its descriptive features or class definitions compared with past knowledge (i.e., concept drifts). The fast detection of drifts is essential to keep the predictive model updated and stable in changing environments. For many applications, such as those related to smart sensors, the high number of features is an additional challenge in terms of memory and time for stream processing. This paper presents an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams. We propose a straightforward two-dimensional data representation that allows the faster processing of datasets with a large number of examples and dimensions. We developed an adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests considering each feature individually. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts. The experimental evaluation considering synthetic and real data demonstrates the method’s versatility in several domains, including entomology, medicine, and transportation systems.
引用
收藏
页码:1497 / 1527
页数:30
相关论文
共 84 条
[1]  
Bass C(2007)Identification of the main malaria vectors in the Malar J 6 155-1604
[2]  
Williamson MS(2010) species complex using a TaqMan real-time PCR assay J Mach Learn Res 11 1601-108
[3]  
Wilding CS(2009)MOA: massive online analysis Knowl Inf Syst 18 83-30
[4]  
Donnelly MJ(2006)A framework for monitoring classifiers’ performance: when and why failure occurs? J Mach Learn Res 7 1-25
[5]  
Field LM(2015)Statistical comparisons of classifiers over multiple data sets IEEE Comput Intell Mag 10 12-26
[6]  
Bifet A(2013)Learning in nonstationary environments: a survey IEEE Trans Neural Netw Learn Syst 25 12-220
[7]  
Holmes G(2014)Compose: a semisupervised learning framework for initially labeled nonstationary streaming data ACM Comput Surv 46 44-57
[8]  
Kirkby R(2000)A survey on concept drift adaptation Circulation 101 215-675
[9]  
Pfahringer B(1976)Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals J R Stat Soc Ser C (Appl Stat) 25 51-612
[10]  
Cieslak DA(2019)Point estimation of the parameters of piecewise regression models Wiley Interdiscip Rev Data Min Knowl Discov 10 e1327-1788