On Density-Based Data Streams Clustering Algorithms: A Survey

被引:0
作者
Amineh Amini
Teh Ying Wah
Hadi Saboohi
机构
[1] Department of Information Systems,
[2] Faculty of Computer Science and Information Technology,undefined
[3] University of Malaya,undefined
来源
Journal of Computer Science and Technology | 2014年 / 29卷
关键词
data stream; density-based clustering; grid-based clustering; micro-clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the number of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density-based methods in the clustering process and at the same time overcoming the constraints, which are put out by data stream’s nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms’ performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms.
引用
收藏
页码:116 / 141
页数:25
相关论文
共 84 条
[21]  
Agrawal R(1981)A Monte Carlo study of thirty internal criterion measures for cluster analysis Psychometrika 46 187-199
[22]  
Gehrke J(2013)A single pass algorithm for clustering evolving data streams based on swarm intelligence Data Mining and Knowledge Discovery 26 1-26
[23]  
Gunopulos D(2010)MOA: Massive online analysis, a framework for stream classification and clustering Journal of Machine Learning Research 11 44-50
[24]  
Raghavan P(2011)Clustering over data streams based on grid density and index tree Journal of Convergence Information Technology 6 83-93
[25]  
Dempster AP(1982)Self-organized formation of topologically correct feature maps Biological Cybernetics 43 59-69
[26]  
Laird NM(2013)An appraisal and design of a multi-agent system based cooperative wireless intrusion detection computational intelligence technique Engineering Applications of Artificial Intelligence 26 2105-2127
[27]  
Rubin DB(1998)Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications Data Mining and Knowledge Discovery 2 169-194
[28]  
Ankerst M(2010)Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer’s disease NeuroImage 50 162-174
[29]  
Breunig MM(2011)Fast density-based lesion detection in dermoscopy images Computerized Medical Imaging and Graphics 35 128-136
[30]  
Kriegel HP(2011)Summarization and matching of density-based clusters in streaming environments Proc. VLDB Endow. 5 121-132