A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les donnees de flux

被引:4
作者
Zadeh, Hadi Tajali [1 ]
Boostani, Reza [1 ]
机构
[1] Shiraz Univ, Fac Elect & Comp Engn, Comp Sci Engn & IT Dept, Shiraz 7134851154, Iran
来源
CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE | 2019年 / 42卷 / 01期
关键词
Clustream; DenStream; incremental Naive Bayes (INB); online clustering; stream clustering;
D O I
10.1109/CJECE.2018.2885326
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers (a) over capL (TM) scores. Here, the incremental Na (A) over tilde (-)ve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.
引用
收藏
页码:27 / 33
页数:7
相关论文
共 21 条
[1]  
Aggarwal C., 2003, 29 INT C VER LARG DA
[2]  
Aggarwal C., 2004, P 30 INT C VER LARG, V30, P852
[3]   On clustering large number of data streams [J].
Al Aghbari, Zaher ;
Kamel, Ibrahim ;
Awad, Thuraya .
INTELLIGENT DATA ANALYSIS, 2012, 16 (01) :69-91
[4]  
[Anonymous], 2001, ACM WORKSH DAT MIN A
[5]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[6]   Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables [J].
Blackard, JA ;
Dean, DJ .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) :131-151
[7]   Density-Based Clustering over an Evolving Data Stream with Noise [J].
Cao, Feng ;
Ester, Martin ;
Qian, Weining ;
Zhou, Aoying .
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, :328-+
[8]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
[9]   A single pass algorithm for clustering evolving data streams based on swarm intelligence [J].
Forestiero, Agostino ;
Pizzuti, Clara ;
Spezzano, Giandomenico .
DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) :1-26
[10]  
Guha S., 1998, CURE, P73, DOI [DOI 10.1145/276305.276312, 10.1145/276305.276312]