Clustering of Data Streams With Dynamic Gaussian Mixture Models: An IoT Application in Industrial Processes

被引:58
作者
Diaz-Rozo, Javier [1 ,2 ]
Bielza, Concha [2 ]
Larranaga, Pedro [2 ]
机构
[1] Aingura IIoT, Elgoibar 20870, Spain
[2] Tech Univ Madrid, Dept Artificial Intelligence, Madrid 28660, Spain
关键词
Concept drift; data stream; dynamic clustering; Gaussian mixture models (GMM); industrial Internet of Things (IIoT);
D O I
10.1109/JIOT.2018.2840129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In industrial Internet of Things applications with sensors sending dynamic process data at high speed, producing actionable insights at the right time is challenging. A key problem concerns processing a large amount of data, while the underlying dynamic phenomena related to the machine is possibly evolving over time due to factors, such as degradation. This makes any actionable model become obsolete and necessary to be updated. To cope with this problem, in this paper we propose a new unsupervised learning algorithm based on Gaussian mixture models called Gaussian-based dynamic probabilistic clustering (GDPC) mainly based on integrating and adapting three well known algorithms for use in dynamic scenarios: the expectationmaximization (EM) algorithm to estimate the model parameters and the Page-Hinkley test and Chernoff bound to detect concept drifts. Unlike other unsupervised methods, the model induced by the GDPC provides the membership probabilities of each instance to each cluster. This allows to determine, through a Brier score analysis, the robustness of the instance assignment and its evolution each time a concept drift is detected. Also, the algorithm works with very little data and significantly less computing power being able to decide whether (and when) to change the model. The algorithm is tested using synthetic data and data streams from an industrial testbed, where different operational states are automatically identified, giving good results in terms of classification accuracy, sensitivity, and specificity.
引用
收藏
页码:3533 / 3547
页数:15
相关论文
共 42 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2011, Maintenance Fundamentals
[3]  
[Anonymous], 2004, FINITE MIXTURE MODEL
[4]  
[Anonymous], 2003, P 29 INT C VER LARG
[5]  
[Anonymous], J EXP ALGORITHM
[6]  
[Anonymous], 2011, Real-Time Systems SE-13, DOI [DOI 10.1007/978-1-4419-8237-7_13, DOI 10.1007/978-1-4419-8237-7]
[7]  
[Anonymous], 2017, SMART FACTORY MACHIN
[8]  
Barbara D., 2001, Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, P239
[9]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[10]   Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers [J].
Borchani, Hanen ;
Larranaga, Pedro ;
Gama, Joao ;
Bielza, Concha .
INTELLIGENT DATA ANALYSIS, 2016, 20 (02) :257-280