CPU-GPU Tuning for Modern Scientific Applications using Node-Level Heterogeneity

被引:0
|
作者
Thavappiragasam, Mathialakan [1 ]
Kale, Vivek [2 ]
机构
[1] Argonne Natl Lab, Lemont, IL 60439 USA
[2] Sandia Natl Labs, Livermore, CA USA
来源
2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023 | 2023年
关键词
inter-device concurrency; performance tuning; CUDA; OpenMP; supercomputer; GPU; CPU; workflows; AI/ML;
D O I
10.1109/HiPC58850.2023.00034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications must be tuned for performance to run efficiently on supercomputers having nodes with a CPU (or, a general-purpose host processor) and GPUs (or, accelerator device processors). Conventional wisdom suggests focusing tuning of applications for a GPU and making the CPU only have the role of offloading computation to the GPU, given the CPU's relatively miniscule amount of computational power. However, this is overly conservative for modern scientific applications, which include those using scientific workflows with real-time data constraints and AI/ML with low numerical precision requirements. This work identifies new performance opportunities for modern scientific applications via CPU-GPU tuning, a strategy that unifies and integrates tuning of the CPU and GPU performance parameters. Applying CPU-GPU tuning to a dot product representative of these applications run on the widely-used Summit supercomputer results in up to an 8.15x speedup. These results provide groundwork for auto-tuning software for applications run on supercomputers having node-level heterogeneity.
引用
收藏
页码:179 / 183
页数:5
相关论文
共 31 条
  • [21] 3-Dimensional Human Head Reconstruction Using Cubic Spline Surface on CPU-GPU Platform
    Hadi, Normi Abdul
    Alias, Norma
    PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY (ICIIT 2019), 2019, : 16 - 20
  • [22] High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments
    Fazel-Najafabadi, Azam
    Abbasi, Mahdi
    Attar, Hani H.
    Amer, Ayman
    Taherkordi, Amir
    Shokrollahi, Azad
    Khosravi, Mohammad R.
    Solyman, Ahmed A.
    TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (04): : 1118 - 1137
  • [23] Accelerating High Performance Computing Applications Using CPUs, GPUs, Hybrid CPU/GPU, and FPGAs
    Liu, Bin
    Zydek, Dawid
    Selvaraj, Henry
    Gewali, Laxmi
    2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 337 - 342
  • [24] Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks (vol 11, 7, 2017)
    Naveros, Francisco
    Garrido, Jesus A.
    Carrillo, Richard R.
    Ros, Eduardo
    Luque, Niceto R.
    FRONTIERS IN NEUROINFORMATICS, 2018, 12
  • [25] Exploring Time-Predictable and High-Performance Last-Level Caches for Hard Real-Time Integrated CPU-GPU Processors
    Wang X.
    Zhang W.
    Zhang, Wei (wei.zhang@louisville.edu), 2020, Korean Institute of Information Scientists and Engineers (14) : 89 - 101
  • [26] DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization
    Das, Sambit
    Motamarri, Phani
    Subramanian, Vishal
    Rogers, David M.
    Gavini, Vikram
    COMPUTER PHYSICS COMMUNICATIONS, 2022, 280
  • [27] A CPU, GPU, FPGA System for X-ray Image Processing using High-speed Scientific Cameras
    Delazari Binotto, Alecio Pedro
    Doering, Dionisio
    Stetzelberger, Thorsten
    McVittie, Patrick
    Zimmermann, Sergio
    Pereira, Carlos Eduardo
    2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 113 - 119
  • [28] Real-time 3D Ball Tracking with CPU-GPU Acceleration Using Particle Filter with Multi-command queues and Stepped Parallelism Iteration
    Hou, Yilin
    Cheng, Xina
    Ikenaga, Takeshi
    2017 2ND INTERNATIONAL CONFERENCE ON MULTIMEDIA AND IMAGE PROCESSING (ICMIP), 2017, : 235 - 239
  • [29] Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions
    Liao, Chunhua
    Quinlan, Daniel J.
    Willcock, Jeremiah J.
    Panas, Thomas
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2010, 38 (5-6) : 361 - 378
  • [30] Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions
    Chunhua Liao
    Daniel J. Quinlan
    Jeremiah J. Willcock
    Thomas Panas
    International Journal of Parallel Programming, 2010, 38 : 361 - 378