Online Clustering of Evolving Data Streams Using a Density Grid-Based Method

被引:38
作者
Tareq, Mustafa [1 ]
Sundararajan, Elankovan A. [1 ]
Mohd, Masnizah [2 ]
Sani, Nor Samsiah [2 ]
机构
[1] Univ Kebangsaan Malaysia, Ctr Software Technol & Management, Fac Informat Sci & Technol, Bangi 43600, Malaysia
[2] Univ Kebangsaan Malaysia, Ctr Artificial Intelligence Technol, Fac Informat Sci & Technol, Bangi 43600, Malaysia
关键词
Clustering algorithms; Real-time systems; Memory management; Software; Shape; Sensors; Social network services; Clustering; data stream; evolving; grid-based method; core-micro-cluster; online; BIG DATA; ITERATIVE FUSION; DATA ANALYTICS; INTERNET; ALGORITHM; THINGS; IOT;
D O I
10.1109/ACCESS.2020.3021684
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, a significant boost in data availability for persistent data streams has been observed. These data streams are continually evolving, with the clusters frequently forming arbitrary shapes instead of regular shapes in the data space. This characteristic leads to an exponential increase in the processing time of traditional clustering algorithms for data streams. In this study, we propose a new online method, which is a density grid-based method for data stream clustering. The primary objectives of the density grid-based method are to reduce the number of distant function calls and to improve the cluster quality. The method is conducted entirely online and consists of two main phases. The first phase generates the Core Micro-Clusters (CMCs), and the second phase combines the CMCs into macro clusters. The grid-based method was utilized as an outlier buffer in order to handle multi-density data and noises. The method was tested on real and synthetic data streams employing different quality metrics and was compared with the popular method of clustering evolving data streams into arbitrary shapes. The proposed method was demonstrated to be an effective solution for reducing the number of calls to the distance function and improving the cluster quality.
引用
收藏
页码:166472 / 166490
页数:19
相关论文
共 88 条
[1]  
Ackermann M.R., 2012, ACM J Exp Algorithmics, V17, p2.1, DOI DOI 10.1145/2133803.2184450
[2]  
Aggarwal C.C., 2013, MANAGING MINING SENS, P383, DOI DOI 10.1007/978-1-4614-6309-2_12
[3]  
Aggarwal C. C., 2003, P 2003 VLDB C, V29, P81
[4]  
Aggarwal C. C., 2007, Data Streams, P169, DOI [10.1007/978-0-387-47534-9_9, DOI 10.1007/978-0-387-47534-9_9]
[5]  
Aggarwal Charu C, 2007, Data streams: models and algorithms, V31
[6]  
AGGARWAL HK, 2013, DATA CLUSTERING ALGO, V2, DOI DOI 10.15275/RUSOMJ.2013.0309
[7]   The role of big data analytics in Internet of Things [J].
Ahmed, Ejaz ;
Yaqoob, Ibrar ;
Hashem, Ibrahim Abaker Targio ;
Khan, Imran ;
Ahmed, Abdelmuttlib Ibrahim Abdalla ;
Imran, Muhammad ;
Vasilakos, Athanasios V. .
COMPUTER NETWORKS, 2017, 129 :459-471
[8]   Applications of big data to smart cities [J].
Al Nuaimi, Eiman ;
Al Neyadi, Hind ;
Mohamed, Nader ;
Al-Jaroodi, Jameela .
JOURNAL OF INTERNET SERVICES AND APPLICATIONS, 2015, 6 (01) :1-15
[9]   An effective density-based clustering and dynamic maintenance framework for evolving medical data streams [J].
Al-Shammari, Ahmed ;
Zhou, Rui ;
Naseriparsaa, Mehdi ;
Liu, Chengfei .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 126 :176-186
[10]  
Ali T., 2010, P 2010 INT C INF EM, P1, DOI DOI 10.1109/ICIET.2010.5625720