CPU-GPU Tuning for Modern Scientific Applications using Node-Level Heterogeneity

被引：0

作者：

Thavappiragasam, Mathialakan ^{[1
]}

Kale, Vivek ^{[2
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Sandia Natl Labs, Livermore, CA USA

来源：

2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023 | 2023年

关键词：

inter-device concurrency; performance tuning; CUDA; OpenMP; supercomputer; GPU; CPU; workflows; AI/ML;

D O I：

10.1109/HiPC58850.2023.00034

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scientific applications must be tuned for performance to run efficiently on supercomputers having nodes with a CPU (or, a general-purpose host processor) and GPUs (or, accelerator device processors). Conventional wisdom suggests focusing tuning of applications for a GPU and making the CPU only have the role of offloading computation to the GPU, given the CPU's relatively miniscule amount of computational power. However, this is overly conservative for modern scientific applications, which include those using scientific workflows with real-time data constraints and AI/ML with low numerical precision requirements. This work identifies new performance opportunities for modern scientific applications via CPU-GPU tuning, a strategy that unifies and integrates tuning of the CPU and GPU performance parameters. Applying CPU-GPU tuning to a dot product representative of these applications run on the widely-used Summit supercomputer results in up to an 8.15x speedup. These results provide groundwork for auto-tuning software for applications run on supercomputers having node-level heterogeneity.

引用

页码：179 / 183

页数：5

共 31 条

[21] 3-Dimensional Human Head Reconstruction Using Cubic Spline Surface on CPU-GPU Platform
Hadi, Normi Abdul
Alias, Norma
PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY (ICIIT 2019), 2019, : 16 - 20
[22] High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments
Fazel-Najafabadi, Azam
Abbasi, Mahdi
Attar, Hani H.
Amer, Ayman
Taherkordi, Amir
Shokrollahi, Azad
Khosravi, Mohammad R.
Solyman, Ahmed A.
TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (04): : 1118 - 1137
[23] Accelerating High Performance Computing Applications Using CPUs, GPUs, Hybrid CPU/GPU, and FPGAs
Liu, Bin
Zydek, Dawid
Selvaraj, Henry
Gewali, Laxmi
2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 337 - 342
[24] Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks (vol 11, 7, 2017)
Naveros, Francisco
Garrido, Jesus A.
Carrillo, Richard R.
Ros, Eduardo
Luque, Niceto R.
FRONTIERS IN NEUROINFORMATICS, 2018, 12
[25] Exploring Time-Predictable and High-Performance Last-Level Caches for Hard Real-Time Integrated CPU-GPU Processors
Wang X.
Zhang W.
Zhang, Wei (wei.zhang@louisville.edu), 2020, Korean Institute of Information Scientists and Engineers (14) : 89 - 101
[26] DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization
Das, Sambit
Motamarri, Phani
Subramanian, Vishal
Rogers, David M.
Gavini, Vikram
COMPUTER PHYSICS COMMUNICATIONS, 2022, 280
[27] A CPU, GPU, FPGA System for X-ray Image Processing using High-speed Scientific Cameras
Delazari Binotto, Alecio Pedro
Doering, Dionisio
Stetzelberger, Thorsten
McVittie, Patrick
Zimmermann, Sergio
Pereira, Carlos Eduardo
2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 113 - 119
[28] Real-time 3D Ball Tracking with CPU-GPU Acceleration Using Particle Filter with Multi-command queues and Stepped Parallelism Iteration
Hou, Yilin
Cheng, Xina
Ikenaga, Takeshi
2017 2ND INTERNATIONAL CONFERENCE ON MULTIMEDIA AND IMAGE PROCESSING (ICMIP), 2017, : 235 - 239
[29] Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions
Liao, Chunhua
Quinlan, Daniel J.
Willcock, Jeremiah J.
Panas, Thomas
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2010, 38 (5-6) : 361 - 378
[30] Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions
Chunhua Liao
Daniel J. Quinlan
Jeremiah J. Willcock
Thomas Panas
International Journal of Parallel Programming, 2010, 38 : 361 - 378

← 1 2 3 4 →