Varying density method for data stream clustering

被引:7
作者
Mousavi, Maryam [1 ,2 ]
Khotanlou, Hassan [1 ]
Abu Bakar, Azuraliza [2 ]
Vakilian, Mohammadmahdi [3 ]
机构
[1] Bu Ali Sina Univ, Fac Engn, Dept Comp Engn, Hamadan, Hamadan, Iran
[2] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Bangi, Selangor, Malaysia
[3] Islamic Azad Univ, Fac Engn, Dept Elect Engn, Hamedan Branch, Hamadan, Hamadan, Iran
关键词
Data stream; Density-based clustering; Merging; Pruning; Varying density; ALGORITHM;
D O I
10.1016/j.asoc.2020.106797
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a new online-offline density-based clustering method for data stream with varying density is proposed. In the online phase, the summary of data is created (often known as microclusters) and in the offline phase, this synopsis of data is used to form the final clusters. Finding the accurate micro-clusters is the goal of online phase. When a new data point arrives, the procedure of finding the nearest and best fit micro-cluster is the time consuming process. This procedure can lead to increase the execution time. To address this problem, a new merging algorithm is proposed. For maintaining a limited number of micro-clusters, a pruning process is applied along with the summarization process. In the existing methods, this pruning process takes too long time to remove micro-clusters whose do not receive objects frequently that cause to increase the memory usage. In this paper, to solve this problem, a new pruning algorithm is introduced. Another problem with density-based methods is that they use global parameters in the data sets with varying density that can lead to dramatic decrease in the clustering quality. In our work, to create final clusters, a new density-based algorithm that works based on only MinPts parameter is proposed for increasing the clustering quality of data sets with varying density. The performance evaluation on both synthetic and real data sets illustrates the efficiency and effectiveness of the proposed method. The experimental results show that our method can increase the clustering quality in data sets with varying density along with limited time and memory usage. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:14
相关论文
共 48 条
[1]  
Aggarwal C. C., 2015, Data mining: the textbook
[2]   EDDS: An Enhanced Density-based Method for Clustering Data Streams [J].
Al Abd Alazeez, Ammar ;
Jassim, Sabah ;
Du, Hongbo .
2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW), 2017, :103-112
[3]  
Aljibawi M., 2018, INT J ENG TECHNOL, V7, P147
[4]  
Alothali E., 2019, TELECOMMUNICATION CO, V17, P728
[5]  
Amini A., 2013, Journal of Computer and Communications, V1, P26, DOI DOI 10.4236/JCC.2013.15005
[6]   MuDi-Stream: A multi density clustering algorithm for evolving data stream [J].
Amini, Amineh ;
Saboohi, Hadi ;
Herawan, Tutut ;
Teh Ying Wah .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 59 :370-385
[7]  
Anant R., 2010, Int. J. Comput. Appl, V3, P1, DOI 10.5120/739-1038
[8]  
[Anonymous], 2014, PROC IEEE VTC FALL
[9]  
[Anonymous], 2017, INT J ADV RES COMPUT, DOI DOI 10.1038/S41598-017-07400-8
[10]   Density-Based Clustering over an Evolving Data Stream with Noise [J].
Cao, Feng ;
Ester, Martin ;
Qian, Weining ;
Zhou, Aoying .
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, :328-+