K-DBSCAN: An improved DBSCAN algorithm for big data

被引：10

作者：

Gholizadeh, Nahid ^{[1
]}

Saadatfar, Hamid ^{[1
]}

Hanafi, Nooshin ^{[1
]}

机构：

[1] Univ Birjand, Birjand, South Khorasan, Iran

来源：

JOURNAL OF SUPERCOMPUTING | 2021年 / 77卷 / 06期

关键词：

Data mining; Clustering; Big data; DBSCAN algorithm; K-means++   algorithm; CLUSTERING-ALGORITHM;

D O I：

10.1007/s11227-020-03524-3

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.

引用

页码：6214 / 6235

页数：22

共 33 条

[1] Analysis of K-Means and K-Medoids Algorithm For Big Data [J].

Arora, Preeti ;

Deepali ;

Varshney, Shipra .

1ST INTERNATIONAL CONFERENCE ON INFORMATION SECURITY & PRIVACY 2015, 2016, 78 :507-512

[2]

Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027

[3] Efficient incremental density-based algorithm for clustering large datasets [J].

Bakr, Ahmad M. ;

Ghanem, Nagia M. ;

Ismail, Mohamed A. .

ALEXANDRIA ENGINEERING JOURNAL, 2015, 54 (04) :1147-1154

[4]

Brown D, 2019, 2019 IEEE 9TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), P48, DOI 10.1109/CCWC.2019.8666548

[5] Big Data: A Survey [J].

Chen, Min ;

Mao, Shiwen ;

Liu, Yunhao .

MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) :171-209

[6] BLOCK-DBSCAN: Fast clustering for large scale data [J].

Chen, Yewang ;

Zhou, Lida ;

Bouguila, Nizar ;

Wang, Cheng ;

Chen, Yi ;

Du, Jixiang .

PATTERN RECOGNITION, 2021, 109

[7]

Cheng YC, 2019, ROUTL RES EDUC, P1, DOI [10.4324/9780429425882, 10.1109/vtcspring.2019.8746552]

[8] Optimized big data K-means clustering using MapReduce [J].

Cui, Xiaoli ;

Zhu, Pingfei ;

Yang, Xin ;

Li, Keqiu ;

Ji, Changqing .

JOURNAL OF SUPERCOMPUTING, 2014, 70 (03) :1249-1259

[9] CLUSTER SEPARATION MEASURE [J].

DAVIES, DL ;

BOULDIN, DW .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227

[10]

Dogan Yunus, 2013, Machine Learning and Data Mining in Pattern Recognition. 9th International Conference, MLDM 2013. Proceedings: LNCS 7988, P246, DOI 10.1007/978-3-642-39712-7_19

← 1 2 3 4 →