MuDi-Stream: A multi density clustering algorithm for evolving data stream

被引:64
作者
Amini, Amineh [1 ]
Saboohi, Hadi [1 ]
Herawan, Tutut [1 ]
Teh Ying Wah [1 ]
机构
[1] Univ Malaya UM, Dept Informat Syst, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
关键词
Evolving data streams; Multi-density clusters; Core mini-clusters; Density grid;
D O I
10.1016/j.jnca.2014.11.007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Density-based method has emerged as a worthwhile class for clustering data streams. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problem. There is a dramatic decrease in the quality of clustering when there is a range in density of data. In this paper, a new method, called the MuDi-Stream, is developed. It is an online-offline algorithm with four main components. In the online phase, it keeps summary information about evolving multi-density data stream in the form of core mini-clusters. The offline phase generates the final clusters using an adapted density-based clustering algorithm. The grid-based method is used as an outlier buffer to handle both noises and multi-density data and yet is used to reduce the merging time of clustering. The algorithm is evaluated on various synthetic and real-world datasets using different quality metrics and further, scalability results are compared. The experimental results show that the proposed method in this study improves clustering quality in multi-density environments. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:370 / 385
页数:16
相关论文
共 46 条
[1]  
Aggarwal CC, 2003, P 2003 VLDB C, V29, P81, DOI DOI 10.1016/B978-012722442-8/50016-1
[2]  
Aggarwal Charu C., 2013, Data clustering: algorithms and applications
[3]  
Amin Namadchian, 2012, 2012 13 ACIS INT C S, P83
[4]  
Amini Amineh, 2011, Proceedings of International MultiConference of Engineers and Computer Scientists 2011 (IMECS 2011), P410
[5]  
Amini A., 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2011), P1652, DOI 10.1109/FSKD.2011.6019867
[6]   A Multi Density-based Clustering Algorithm for Data Stream with Noise [J].
Amini, Amineh ;
Saboohi, Hadi ;
Teh, Ying Wah .
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, :1105-1112
[7]   On Density-Based Data Streams Clustering Algorithms: A Survey [J].
Amini, Amineh ;
Teh, Ying Wah ;
Saboohi, Hadi .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (01) :116-141
[8]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[9]  
[Anonymous], 2014, SCI WORLD J
[10]   Clustering data streams using grid-based synopsis [J].
Bhatnagar, Vasudha ;
Kaur, Sharanjit ;
Chakravarthy, Sharma .
KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 41 (01) :127-152