Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

被引:6
作者
Daiss, Gregor [1 ]
Diehl, Patrick [2 ]
Kaiser, Hartmut [2 ]
Pflueger, Dirk [1 ]
机构
[1] Univ Stuttgart, Stuttgart, Germany
[2] Louisiana State Univ, Baton Rouge, LA USA
来源
PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON OPENCL, IWOCL 2023 | 2023年
关键词
SYCL; Kokkos; HPX; AMT; GPU; CUDA; HIP; SIMD;
D O I
10.1145/3585341.3585354
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, an astrophysics application simulating binary star systems and stellar mergers, we rely on Kokkos and its various execution spaces for portable compute kernels. In turn, we use HPX, a distributed task-based runtime system, to coordinate kernel launches, CPU tasks, and communication. This combination allows us to have a fine interleaving between portable CPU/GPU computations and communication, enabling scalability on various supercomputers. However, for HPX and Kokkos to work together optimally, we need to be able to treat Kokkos kernels as HPX tasks. Otherwise, instead of integrating asynchronous Kokkos kernel launches into HPX's task graph, we would have to actively wait for them with fence commands, which wastes CPU time better spent otherwise. Using an integration layer called HPX-Kokkos, treating Kokkos kernels as tasks already works for some Kokkos execution spaces (like the CUDA one), but not for others (like the SYCL one). In this work, we started making Octo-Tiger and HPX itself compatible with SYCL. To do so, we introduce numerous software changes most notably an HPX-SYCL integration. This integration allows us to treat SYCL events as HPX tasks, which in turn allows us to better integrate Kokkos by extending the support of HPX-Kokkos to also fully support Kokkos' SYCL execution space. We show two ways to implement this HPX-SYCL integration and test them using Octo-Tiger and its Kokkos kernels, on both an NVIDIA A100 and an AMD MI100. We find modest, yet noticeable, speedups (1.11x to 1.15x for the relevant configurations) by enabling this integration, even when just running simple single-node scenarios with Octo-Tiger where communication and CPU utilization are not yet an issue. We further find that the integration using event polling within the HPX scheduler works far better than the alternative implementation using SYCL host tasks.
引用
收藏
页数:12
相关论文
共 28 条
  • [1] Alpay A., 2020, P INT WORKSH OPENCL, P1
  • [2] Bauer M, 2012, INT CONF HIGH PERFOR
  • [3] PaRSRC: Exploiting Heterogeneity to Enhance Scalability
    Bosilca, George
    Bouteiller, Aurelien
    Danalis, Anthony
    Faverge, Mathieu
    Herault, Thomas
    Dongarra, Jack J.
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (06) : 36 - 45
  • [4] Parallel programmability and the Chapel language
    Chamberlain, B. L.
    Callahan, D.
    Zima, H. P.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) : 291 - 312
  • [5] An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
    Chiu, Cheng-Hsiang
    Lin, Dian-Lun
    Huang, Tsung-Wei
    [J]. EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 468 - 479
  • [6] From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types
    Daiss, Gregor
    Singanaboina, Srinivas Yadav
    Diehl, Patrick
    Kaiser, Hartmut
    Pflueger, Dirk
    [J]. 2022 IEEE/ACM 7TH INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2), 2022, : 10 - 19
  • [7] From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels
    Daiss, Gregor
    Diehl, Patrick
    Marcello, Dominic
    Kheirkhahan, Alireza
    Kaiser, Hartmut
    Pflueger, Dirk
    [J]. 2022 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC), 2022, : 89 - 99
  • [8] Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX
    Daiss, Gregor
    Simberg, Mikael
    Reverdell, Auriane
    Biddiscombe, John
    Pollinger, Theresa
    Kaiser, Hartmut
    Pfluger, Dirk
    [J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 377 - 386
  • [9] From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions
    Daiss, Gregor
    Amini, Parsa
    Biddiscombe, John
    Diehl, Patrick
    Frank, Juhan
    Huck, Kevin
    Kaiser, Hartmut
    Marcello, Dominic
    Pfander, David
    Pflueger, Dirk
    [J]. PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [10] Octo-Tiger's New Hydro Module and Performance Using HPX plus CUDA on ORNL's Summit
    Diehl, Patrick
    Daiss, Gregor
    Marcello, Dominic
    Huck, Kevin
    Shiber, Sagiv
    Kaiser, Hartmut
    Frank, Juhan
    Clayton, Geoffrey C.
    Pflueger, Dirk
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 204 - 214