Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

被引：6

作者：

Daiss, Gregor ^{[1
]}

Diehl, Patrick ^{[2
]}

Kaiser, Hartmut ^{[2
]}

Pflueger, Dirk ^{[1
]}

机构：

[1] Univ Stuttgart, Stuttgart, Germany

[2] Louisiana State Univ, Baton Rouge, LA USA

来源：

PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON OPENCL, IWOCL 2023 | 2023年

关键词：

SYCL; Kokkos; HPX; AMT; GPU; CUDA; HIP; SIMD;

D O I：

10.1145/3585341.3585354

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of available accelerator cards within current supercomputers, portability is a key aspect for modern HPC applications. In Octo-Tiger, an astrophysics application simulating binary star systems and stellar mergers, we rely on Kokkos and its various execution spaces for portable compute kernels. In turn, we use HPX, a distributed task-based runtime system, to coordinate kernel launches, CPU tasks, and communication. This combination allows us to have a fine interleaving between portable CPU/GPU computations and communication, enabling scalability on various supercomputers. However, for HPX and Kokkos to work together optimally, we need to be able to treat Kokkos kernels as HPX tasks. Otherwise, instead of integrating asynchronous Kokkos kernel launches into HPX's task graph, we would have to actively wait for them with fence commands, which wastes CPU time better spent otherwise. Using an integration layer called HPX-Kokkos, treating Kokkos kernels as tasks already works for some Kokkos execution spaces (like the CUDA one), but not for others (like the SYCL one). In this work, we started making Octo-Tiger and HPX itself compatible with SYCL. To do so, we introduce numerous software changes most notably an HPX-SYCL integration. This integration allows us to treat SYCL events as HPX tasks, which in turn allows us to better integrate Kokkos by extending the support of HPX-Kokkos to also fully support Kokkos' SYCL execution space. We show two ways to implement this HPX-SYCL integration and test them using Octo-Tiger and its Kokkos kernels, on both an NVIDIA A100 and an AMD MI100. We find modest, yet noticeable, speedups (1.11x to 1.15x for the relevant configurations) by enabling this integration, even when just running simple single-node scenarios with Octo-Tiger where communication and CPU utilization are not yet an issue. We further find that the integration using event polling within the HPX scheduler works far better than the alternative implementation using SYCL host tasks.

引用

页数：12

共 28 条

[1] Alpay A., 2020, P INT WORKSH OPENCL, P1
[2] Bauer M, 2012, INT CONF HIGH PERFOR
[3] PaRSRC: Exploiting Heterogeneity to Enhance Scalability
Bosilca, George
Bouteiller, Aurelien
Danalis, Anthony
Faverge, Mathieu
Herault, Thomas
Dongarra, Jack J.
[J]. COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (06) : 36 - 45
[4] Parallel programmability and the Chapel language
Chamberlain, B. L.
Callahan, D.
Zima, H. P.
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) : 291 - 312
[5] An Experimental Study of SYCL Task Graph Parallelism for Large-Scale Machine Learning Workloads
Chiu, Cheng-Hsiang
Lin, Dian-Lun
Huang, Tsung-Wei
[J]. EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 468 - 479
[6] From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types
Daiss, Gregor
Singanaboina, Srinivas Yadav
Diehl, Patrick
Kaiser, Hartmut
Pflueger, Dirk
[J]. 2022 IEEE/ACM 7TH INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2), 2022, : 10 - 19
[7] From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels
Daiss, Gregor
Diehl, Patrick
Marcello, Dominic
Kheirkhahan, Alireza
Kaiser, Hartmut
Pflueger, Dirk
[J]. 2022 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC), 2022, : 89 - 99
[8] Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX
Daiss, Gregor
Simberg, Mikael
Reverdell, Auriane
Biddiscombe, John
Pollinger, Theresa
Kaiser, Hartmut
Pfluger, Dirk
[J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 377 - 386
[9] From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions
Daiss, Gregor
Amini, Parsa
Biddiscombe, John
Diehl, Patrick
Frank, Juhan
Huck, Kevin
Kaiser, Hartmut
Marcello, Dominic
Pfander, David
Pflueger, Dirk
[J]. PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[10] Octo-Tiger's New Hydro Module and Performance Using HPX plus CUDA on ORNL's Summit
Diehl, Patrick
Daiss, Gregor
Marcello, Dominic
Huck, Kevin
Shiber, Sagiv
Kaiser, Hartmut
Frank, Juhan
Clayton, Geoffrey C.
Pflueger, Dirk
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 204 - 214

← 1 2 3 →