A buffer-based online clustering for evolving data stream

被引:36
作者
Islam, Md. Kamrul [1 ]
Ahmed, Md. Manjur [2 ]
Zamli, Kamal Z. [1 ]
机构
[1] Univ Malaysia Pahang, Fac Comp Syst & Software Engn, Kuantan 26300, Pahang, Malaysia
[2] Univ Barisal, Dept Comp Sci & Engn, Kornokathi 8200, Barisal, Bangladesh
关键词
Density-based clustering; Evolving data stream; Arbitrarily shaped cluster; Clustering graph; DENSITY; ALGORITHM;
D O I
10.1016/j.ins.2019.03.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data stream clustering plays an important role in data stream mining for knowledge extraction. Numerous researchers have recently studied density-based clustering algorithms due to their capability to generate arbitrarily shaped clusters. However, most of the algorithms are either fully offline, hybrid online/offline, or cannot handle the property of evolving data stream. Recently, a fully online clustering algorithm for evolving data stream called CEDAS was proposed. However, similar to other density-based clustering algorithms, CEDAS requires predefining the global optimal radius of micro-clusters, which is a difficult task: in addition, an erroneous choice deteriorates cluster performance. Moreover, the algorithm ignores the presence of temporarily irrelevant micro-clusters, which may be relevant in the future. In this study, we present a fully online density-based clustering algorithm called buffer-based online clustering for evolving data stream (BOCEDS). This algorithm recursively updates the micro-cluster radius to its local optimal. It also introduces a buffer for storing irrelevant micro-clusters and a fully online pruning method for extracting the temporarily irrelevant micro-cluster from the buffer. In addition, BOCEDS proposes an online micro-cluster energy-updating function based on the spatial information of the data stream. Experimental results are compared with those of CEDAS and other alternative hybrid online/offline density-based clustering algorithms, and BOCEDS proves its superiority over the other clustering algorithms. The sensitivity of clustering parameters is also measured. The proposed algorithm is then applied to real-world weather data streams to demonstrate its capability to detect changes in data stream and discover arbitrarily shaped clusters. The proposed BOCEDS can be available in https://sites.google.com/view/md-manjur-ahmed and https://sites.google.com/view/kamrul-just. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:113 / 135
页数:23
相关论文
共 61 条
[1]   Memory Rehabilitation: Integrating Theory and Practice [J].
Ali, Tarick .
INTERNATIONAL PSYCHOGERIATRICS, 2011, 23 (02) :338-339
[2]   On Density-Based Data Streams Clustering Algorithms: A Survey [J].
Amini, Amineh ;
Teh, Ying Wah ;
Saboohi, Hadi .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (01) :116-141
[3]  
[Anonymous], 2003, P 29 INT C VER LARG
[4]  
[Anonymous], 2001, Foundations of Cognitive Science
[5]  
Baruah RD, 2012, IEEE INT CONF FUZZY
[6]   DEC: Dynamically Evolving Clustering and Its Application to Structure Identification of Evolving Fuzzy Models [J].
Baruah, Rashmi Dutta ;
Angelov, Plamen .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (09) :1619-1631
[7]  
Bay SD., 2000, ACM SIGKDD EXPLORATI, V2, P81, DOI [DOI 10.1145/380995.381030, 10.1145/380995.381030]
[8]   Big Social Network Data and Sustainable Economic Development [J].
Can, Umit ;
Alatas, Bilal .
SUSTAINABILITY, 2017, 9 (11)
[9]   Density-Based Clustering over an Evolving Data Stream with Noise [J].
Cao, Feng ;
Ester, Martin ;
Qian, Weining ;
Zhou, Aoying .
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, :328-+
[10]   An Empirical Comparison of Stream Clustering Algorithms [J].
Carnein, Matthias ;
Assenmacher, Dennis ;
Trautmann, Heike .
ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, :361-366