Hybrid CPU-GPU scheduling and execution of tree traversals

被引：1

作者：

Liu, Jianqiao ^{[1
]}

Hegde, Nikhil ^{[1
]}

Kulkarni, Milind ^{[1
]}

机构：

[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA

来源：

ACM SIGPLAN NOTICES | 2016年 / 51卷 / 08期

关键词：

Heterogeneous architectures; Scheduling; Irregular applications; Tree traversal;

D O I：

10.1145/2851141.2851174

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

GPUs offer the promise of massive, power-efficient parallelism. However, exploiting this parallelism requires code to be carefully structured to deal with the limitations of the SIMT execution model. In recent years, there has been much interest in mapping irregular applications to GPUs: applications with unpredictable, data-dependent behaviors. While most of the work in this space has focused on ad hoc implementations of specific algorithms, recent work has looked at generic techniques for mapping a large class of tree traversal algorithms to GPUs, through careful restructuring of the tree traversal algorithms to make them behave more regularly. Unfortunately, even this general approach for GPU execution of tree traversal algorithms is reliant on ad hoc, handwritten, algorithm-specific scheduling (i.e., assignment of threads to warps) to achieve high performance. The key challenge of scheduling is that it is a highly irregular process, that requires the inspection of thread behavior and then careful sorting of the threads into warps. In this paper, we present a novel scheduling and execution technique for tree traversal algorithms that is both general and automatic. The key novelty is a hybrid approach: the GPU partially executes tasks to inspect thread behavior and transmits information back to the CPU, which uses that information to perform the scheduling itself, before executing the remaining, carefully scheduled, portion of the traversals on the GPU. We applied this framework to five tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses hand-tuned, applicationspecific scheduling.

引用

页码：385 / 386

页数：2

共 38 条

[11] The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters
Beri, Tarun
Bansal, Sorav
Kumar, Subodh
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1518 - 1534
[12] Supporting Energy-Efficient Computing on Heterogeneous CPU-GPU Architectures
Siehl, Kyle
Zhao, Xinghui
2017 IEEE 5TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD 2017), 2017, : 134 - 141
[13] MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform
Jiang, Jiazhi
Tian, Rui
Du, Jiangsu
Huang, Dan
Lu, Yutong
2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 366 - 374
[14] CPU plus GPU scheduling with asymptotic profiling
Wang, Zhenning
Zheng, Long
Chen, Quan
Guo, Minyi
PARALLEL COMPUTING, 2014, 40 (02) : 107 - 115
[15] Integer Programming Based Heterogeneous CPU-GPU Cluster Scheduler for SLURM Resource Manager
Soner, Seren
Ozturan, Can
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 418 - 424
[16] KubeSC-RTP: Smart scheduler for Kubernetes platform on CPU-GPU heterogeneous systems
Harichane, Ishak
Makhlouf, Sid Ahmed
Belalem, Ghalem
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (21)
[17] The Scheduling Based on Machine Learning for Heterogeneous CPU/GPU Systems
Shulga, D. A.
Kapustin, A. A.
Kozlov, A. A.
Kozyrev, A. A.
Rovnyagin, M. M.
PROCEEDINGS OF THE 2016 IEEE NORTH WEST RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (ELCONRUSNW), 2016, : 345 - 348
[18] A hybrid CPU/GPU approach for optimizing sorting throughput
Gowanlock, Michael
Karsin, Ben
PARALLEL COMPUTING, 2019, 85 : 45 - 55
[19] Energy-Efficient Resource Management for Federated Edge Learning With CPU-GPU Heterogeneous Computing
Zeng, Qunsong
Du, Yuqing
Huang, Kaibin
Leung, Kin K.
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (12) : 7947 - 7962
[20] A User-Centric CPU-GPU Governing Framework for 3-D Mobile Games
Chen, Wei-Ming
Cheng, Sheng-Wei
Hsiu, Pi-Cheng
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (05) : 961 - 974

← 1 2 3 4 →