K-DBSCAN: An improved DBSCAN algorithm for big data

被引:0
作者
Nahid Gholizadeh
Hamid Saadatfar
Nooshin Hanafi
机构
[1] University of Birjand,
来源
The Journal of Supercomputing | 2021年 / 77卷
关键词
Data mining; Clustering; Big data; DBSCAN algorithm; K-means++  algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.
引用
收藏
页码:6214 / 6235
页数:21
相关论文
共 64 条
  • [1] Storey V(2017)Big data technologies and management: What conceptual modeling can do Data KnowlEng 108 50-67
  • [2] Song I(2016)Analysis of K-Means and K-Medoids algorithm for big data ProcediaCompuSci 78 507-512
  • [3] Arora P(1999)Data clustering: a review ACM ComputSurv 31 264-323
  • [4] Deepali D(2020)Vehicle re-identification using quadruple directional deep learning features IEEE Trans IntellTranspSyst 21 410-420
  • [5] Varshney S(2017)SAR image denoising via sparse representation in Shearlet domain based on continuous cycle spinning IEEE Trans Geosci Remote Sens 55 2985-2992
  • [6] Jain A(2020)3DACN: 3D augmented convolutional network for time series data InfSci 513 17-29
  • [7] Murty M(2012)A new blockmodeling based hierarchical clustering algorithm for web social networks EngApplArtifIntell 25 640-647
  • [8] Flynn P(2014)Beyond batch processing: Towards real-time and streaming big data Computers 3 117-129
  • [9] Zhu J(2014)Big data: A survey Mob NetwAppl 19 171-209
  • [10] Zeng M(2015)Efficient incremental density-based algorithm for clustering large datasets Alex Eng J 54 1147-1154