CPU-GPU Tuning for Modern Scientific Applications using Node-Level Heterogeneity

被引:0
|
作者
Thavappiragasam, Mathialakan [1 ]
Kale, Vivek [2 ]
机构
[1] Argonne Natl Lab, Lemont, IL 60439 USA
[2] Sandia Natl Labs, Livermore, CA USA
来源
2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023 | 2023年
关键词
inter-device concurrency; performance tuning; CUDA; OpenMP; supercomputer; GPU; CPU; workflows; AI/ML;
D O I
10.1109/HiPC58850.2023.00034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications must be tuned for performance to run efficiently on supercomputers having nodes with a CPU (or, a general-purpose host processor) and GPUs (or, accelerator device processors). Conventional wisdom suggests focusing tuning of applications for a GPU and making the CPU only have the role of offloading computation to the GPU, given the CPU's relatively miniscule amount of computational power. However, this is overly conservative for modern scientific applications, which include those using scientific workflows with real-time data constraints and AI/ML with low numerical precision requirements. This work identifies new performance opportunities for modern scientific applications via CPU-GPU tuning, a strategy that unifies and integrates tuning of the CPU and GPU performance parameters. Applying CPU-GPU tuning to a dot product representative of these applications run on the widely-used Summit supercomputer results in up to an 8.15x speedup. These results provide groundwork for auto-tuning software for applications run on supercomputers having node-level heterogeneity.
引用
收藏
页码:179 / 183
页数:5
相关论文
共 31 条
  • [31] Accurate Energy Modelling of Hybrid Parallel Applications on Modern Heterogeneous Computing Platforms Using System-Level Measurements
    Fahad, Muhammad
    Shahid, Arsalan
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE ACCESS, 2020, 8 : 93793 - 93829