CPU-GPU Tuning for Modern Scientific Applications using Node-Level Heterogeneity

被引：0

作者：

Thavappiragasam, Mathialakan ^{[1
]}

Kale, Vivek ^{[2
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Sandia Natl Labs, Livermore, CA USA

来源：

2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023 | 2023年

关键词：

inter-device concurrency; performance tuning; CUDA; OpenMP; supercomputer; GPU; CPU; workflows; AI/ML;

D O I：

10.1109/HiPC58850.2023.00034

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scientific applications must be tuned for performance to run efficiently on supercomputers having nodes with a CPU (or, a general-purpose host processor) and GPUs (or, accelerator device processors). Conventional wisdom suggests focusing tuning of applications for a GPU and making the CPU only have the role of offloading computation to the GPU, given the CPU's relatively miniscule amount of computational power. However, this is overly conservative for modern scientific applications, which include those using scientific workflows with real-time data constraints and AI/ML with low numerical precision requirements. This work identifies new performance opportunities for modern scientific applications via CPU-GPU tuning, a strategy that unifies and integrates tuning of the CPU and GPU performance parameters. Applying CPU-GPU tuning to a dot product representative of these applications run on the widely-used Summit supercomputer results in up to an 8.15x speedup. These results provide groundwork for auto-tuning software for applications run on supercomputers having node-level heterogeneity.

引用

页码：179 / 183

页数：5