Vectorized Highly Parallel Density-Based Clustering for Applications With Noise

被引:0
|
作者
Xavier, Joseph Arnold [1 ,2 ]
Muriedas, Juan Pedro Gutierrez Hermosillo [3 ]
Nassyr, Stepan [1 ]
Sedona, Rocco [1 ]
Goetz, Markus [3 ]
Streit, Achim [3 ]
Riedel, Morris [1 ,2 ]
Cavallaro, Gabriele [1 ,2 ]
机构
[1] Forschungszentrum Julich, Julich Supercomp Ctr JSC, D-52428 Julich, Germany
[2] Univ Iceland, Sch Engn & Nat Sci, IS-107 Reykjavik, Iceland
[3] Karlsruhe Inst Technol, Sci Comp Ctr SCC, D-76344 Eggenstein Leopoldshafen, Germany
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; Single instruction multiple data; Vectors; Noise; High performance computing; Computational efficiency; Central Processing Unit; Time complexity; Merging; Indexing; High-performance computing; density-based clustering; vectorization; VHPDBSCAN; ALGORITHM; DATASETS; AVX-512;
D O I
10.1109/ACCESS.2024.3507193
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.
引用
收藏
页码:181679 / 181692
页数:14
相关论文
共 50 条
  • [21] Density-Based Clustering over an Evolving Data Stream with Noise
    Cao, Feng
    Ester, Martin
    Qian, Weining
    Zhou, Aoying
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 328 - +
  • [22] An Improved Random Sample Consensus Based on Density-Based Spatial Clustering of Applications with Noise for Image Mosaic
    Liu, Jinda
    Hou, Yanyang
    Pei, Hongxing
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2021, 31 (04) : 625 - 631
  • [23] An Improved Random Sample Consensus Based on Density-Based Spatial Clustering of Applications with Noise for Image Mosaic
    Yanyang Jinda Liu
    Hongxing Hou
    Pattern Recognition and Image Analysis, 2021, 31 : 625 - 631
  • [24] A Multi Density-based Clustering Algorithm for Data Stream with Noise
    Amini, Amineh
    Saboohi, Hadi
    Teh, Ying Wah
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 1105 - 1112
  • [25] An adaptive density-based clustering algorithm for spatial database with noise
    Ma, DY
    Zhang, AD
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 467 - 470
  • [26] Underwater Sensor Network Deployment Algorithm Using Density-based Spatial Clustering of Applications with Noise
    Wang, Hui
    Chang, Tingcheng
    Fan, Yexian
    Li, Zhiliang
    SENSORS AND MATERIALS, 2019, 31 (03) : 845 - 858
  • [27] ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities
    Khan, Mohammad Mahmudur Rahman
    Siddique, Md. Abu Bakr
    Arif, Rezoana Bente
    Oishe, Mahjabin Rahman
    2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT), 2018, : 107 - 111
  • [28] HiClus: Highly Scalable Density-based Clustering with Heterogeneous Cloud
    Chen, Chun-Chieh
    Chen, Ming-Syan
    INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 149 - 157
  • [29] Aluminum alloy microstructural segmentation method based on simple noniterative clustering and adaptive density-based spatial clustering of applications with noise
    Zhang, Shiyue
    Chen, Dali
    Liu, Shixin
    Zhang, Pengyuan
    Zhao, Wei
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (03)
  • [30] An Adaptive Clustering Scheme Based on Modified Density-Based Spatial Clustering of Applications with Noise Algorithm in Ultra-Dense Networks
    Ren, Yuting
    Xu, Rongtao
    2019 IEEE 90TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2019-FALL), 2019,