K-DBSCAN: An efficient density-based clustering algorithm supports parallel computing

被引:0
作者
Deng C. [1 ,2 ]
Song J. [3 ]
Cai S. [1 ]
Sun R. [1 ]
Shi Y. [1 ]
Hao S. [1 ]
机构
[1] College of Information and Electrical Engineering, China Agricultural University, Beijing
[2] China Tobacco Guangxi Industrial Co., Ltd., No. 28 Beihunanlu, Xixiangtang District, Nanning
[3] National Space Science Center of CAS, No. 1 Nanertiao, Zhongguancun, Haidian district, Beijing
关键词
Clustering; Data mining; DBSCAN; Density-based; K-means; Parallel computing; Spatial data;
D O I
10.1504/IJSPM.2018.094740
中图分类号
学科分类号
摘要
DBSCAN is the most representative density-based clustering algorithm and has been widely used in many fields. However, the running time of DBSCAN is unacceptable in many actual applications. To improve its performance, this paper presents a new 2D density-based clustering algorithm, K-DBSCAN, which successfully reduces the computational complexity of the clustering process by a simplified k-mean partitioning process and a reachable partition index, and enables parallel computing by a divide-and-conquer method. The experiments show that K-DBSCAN achieves remarkable accuracy, efficiency and applicability compared with conventional DBSCAN algorithms especially in large-scale spatial density-based clustering. The time complexity of K-DBSCAN is O(N2/KC), where K is the number of data partitions, and C is the number of physical computing cores. © 2018 Inderscience Enterprises Ltd.
引用
收藏
页码:496 / 505
页数:9
相关论文
共 31 条
[1]  
Andrade G., Ramos G., Madeira D., Et al., G-dbscan: A GPU accelerated algorithm for density-based clustering, Procedia Computer Science, 18, pp. 369-378, (2013)
[2]  
Ankerst M., Breunig M.M., Kriegel H.P., Et al., OPTICS: Ordering points to identify the clustering structure, SIGMOD 1999, pp. 49-60, (1999)
[3]  
Beckmann N., Kriegel H.P., Schneider R., Et al., The R∗-tree: An efficient and robust access method for points and rectangles, ACM SIGMOD Record, 19, 2, pp. 322-331, (1990)
[4]  
Birant D., Kut A., ST-DBSCAN: An algorithm for clustering spatial-temporal data, Data and Knowledge Engineering, 60, 1, pp. 208-221, (2007)
[5]  
Bohm C., Berchtold S., Kriegel H.P., Et al., Multidimensional index structures in relational databases, Journal of Intelligent Information Systems, 15, 1, pp. 51-70, (2000)
[6]  
Bohm C., Noll R., Plant C., Et al., Density-based clustering using graphics processors, Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 661-670, (2009)
[7]  
Dua D., Karra Taniskidou E., UCI Machine Learning Repository, (2017)
[8]  
Ertoz L., Steinbach M., Kumar V., Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 47-58, (2003)
[9]  
Ester M., Kriegel H.P., Sander J., Et al., A density-based algorithm for discovering clusters in large spatial databases with noise, KDD, 96, 34, pp. 226-231, (1996)
[10]  
Ester M., Kriegel H.P., Sander J., Et al., Incremental clustering for mining in a data warehousing environment, Proceedings of International Conference on Very Large Data Bases, pp. 323-333, (1998)