Vectorized Highly Parallel Density-Based Clustering for Applications With Noise

被引:0
|
作者
Xavier, Joseph Arnold [1 ,2 ]
Muriedas, Juan Pedro Gutierrez Hermosillo [3 ]
Nassyr, Stepan [1 ]
Sedona, Rocco [1 ]
Goetz, Markus [3 ]
Streit, Achim [3 ]
Riedel, Morris [1 ,2 ]
Cavallaro, Gabriele [1 ,2 ]
机构
[1] Forschungszentrum Julich, Julich Supercomp Ctr JSC, D-52428 Julich, Germany
[2] Univ Iceland, Sch Engn & Nat Sci, IS-107 Reykjavik, Iceland
[3] Karlsruhe Inst Technol, Sci Comp Ctr SCC, D-76344 Eggenstein Leopoldshafen, Germany
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; Single instruction multiple data; Vectors; Noise; High performance computing; Computational efficiency; Central Processing Unit; Time complexity; Merging; Indexing; High-performance computing; density-based clustering; vectorization; VHPDBSCAN; ALGORITHM; DATASETS; AVX-512;
D O I
10.1109/ACCESS.2024.3507193
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering in data mining involves grouping similar objects into categories based on their characteristics. As the volume of data continues to grow and advancements in high-performance computing evolve, a critical need has emerged for algorithms that can efficiently process these computations and exploit the various levels of parallelism offered by modern supercomputing systems. Exploiting Single Instruction Multiple Data (SIMD) instructions enhances parallelism at the instruction level and minimizes data movement within the memory hierarchy. To fully harness a processor's SIMD capabilities and achieve optimal performance, adapting algorithms for better compatibility with vector operations is necessary. In this paper, we introduce a vectorized implementation of the Density-based Clustering for Applications with Noise (DBSCAN) algorithm suitable for the execution on both shared and distributed memory systems. By leveraging SIMD, we enhance the performance of distance computations. Our proposed Vectorized HPDBSCAN (VHPDBSCAN) demonstrates a performance improvement of up to two times over the state-of-the-art parallel version, Highly Parallel DBSCAN (HPDBSCAN), on the ARM-based A64FX processor on two different datasets with varying dimensions. We have parallelized computations which are essential for the efficient workload distribution. This has significantly enhanced the performance on higher dimensional datasets. Additionally, we evaluate VHPDBSCAN's energy consumption on the A64FX and Intel Xeon processors. The results show that in both processors, due to the reduced runtime, the total energy consumption of the application is reduced by 50% on the A64FX Central Processing Unit (CPU) and by approximately 19% on the Intel Xeon 8368 CPU compared to HPDBSCAN.
引用
收藏
页码:181679 / 181692
页数:14
相关论文
共 50 条
  • [1] GRIDBSCAN: GRId density-based spatial clustering of applications with noise
    Uncu, Ozge
    Gruver, William A.
    Kotak, Dilip B.
    Sabaz, Dorian
    Alibhai, Zafeer
    Ng, Colin
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 2976 - +
  • [2] Anytime parallel density-based clustering
    Mai, Son T.
    Assent, Ira
    Jacobsen, Jon
    Dieu, Martin Storgaard
    DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (04) : 1121 - 1176
  • [3] Anytime parallel density-based clustering
    Son T. Mai
    Ira Assent
    Jon Jacobsen
    Martin Storgaard Dieu
    Data Mining and Knowledge Discovery, 2018, 32 : 1121 - 1176
  • [4] PARDICLE: Parallel Approximate Density-based Clustering
    Patwary, Md. Mostofa Ali
    Satish, Nadathur
    Sundaram, Narayanan
    Manne, Fredrik
    Habib, Salman
    Dubey, Pradeep
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 560 - 571
  • [5] Parallel Image Scaling Density-based Clustering
    Bi, Wenhao
    Zhang, An
    Gao, Fei
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 2084 - 2091
  • [6] Parallel density-based clustering of complex objects
    Brecheisen, Stefan
    Kriegel, Hans-Peter
    Pfeifle, Martin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 179 - 188
  • [7] DBHD: Density-based clustering for highly varying density
    Durani, Walid
    Mautz, Dominik
    Plant, Claudia
    Boehm, Christian
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 921 - 926
  • [8] ADAPTIVE DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE (DBSCAN) ACCORDING TO DATA
    Wang, Wei-Tung
    Wu, Yi-Leh
    Tang, Cheng-Yuan
    Hor, Maw-Kae
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 445 - 451
  • [9] A Density-Based Clustering Algorithm with Educational Applications
    Wang, Zitong
    Kang, Peng
    Wu, Zewei
    Rao, Yanghui
    Wang, Fu Lee
    CURRENT DEVELOPMENTS IN WEB BASED LEARNING, ICWL 2015, 2016, 9584 : 118 - 127
  • [10] Improved Parallel Algorithms for Density-Based Network Clustering
    Ghaffari, Mohsen
    Lattanzi, Silvio
    Mitrovic, Slobodan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97