Hybrid CPU-GPU scheduling and execution of tree traversals

被引:1
作者
Liu, Jianqiao [1 ]
Hegde, Nikhil [1 ]
Kulkarni, Milind [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
关键词
Heterogeneous architectures; Scheduling; Irregular applications; Tree traversal;
D O I
10.1145/2851141.2851174
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPUs offer the promise of massive, power-efficient parallelism. However, exploiting this parallelism requires code to be carefully structured to deal with the limitations of the SIMT execution model. In recent years, there has been much interest in mapping irregular applications to GPUs: applications with unpredictable, data-dependent behaviors. While most of the work in this space has focused on ad hoc implementations of specific algorithms, recent work has looked at generic techniques for mapping a large class of tree traversal algorithms to GPUs, through careful restructuring of the tree traversal algorithms to make them behave more regularly. Unfortunately, even this general approach for GPU execution of tree traversal algorithms is reliant on ad hoc, handwritten, algorithm-specific scheduling (i.e., assignment of threads to warps) to achieve high performance. The key challenge of scheduling is that it is a highly irregular process, that requires the inspection of thread behavior and then careful sorting of the threads into warps. In this paper, we present a novel scheduling and execution technique for tree traversal algorithms that is both general and automatic. The key novelty is a hybrid approach: the GPU partially executes tasks to inspect thread behavior and transmits information back to the CPU, which uses that information to perform the scheduling itself, before executing the remaining, carefully scheduled, portion of the traversals on the GPU. We applied this framework to five tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses hand-tuned, applicationspecific scheduling.
引用
收藏
页码:385 / 386
页数:2
相关论文
共 38 条
  • [11] The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters
    Beri, Tarun
    Bansal, Sorav
    Kumar, Subodh
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1518 - 1534
  • [12] Supporting Energy-Efficient Computing on Heterogeneous CPU-GPU Architectures
    Siehl, Kyle
    Zhao, Xinghui
    2017 IEEE 5TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD 2017), 2017, : 134 - 141
  • [13] MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform
    Jiang, Jiazhi
    Tian, Rui
    Du, Jiangsu
    Huang, Dan
    Lu, Yutong
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 366 - 374
  • [14] CPU plus GPU scheduling with asymptotic profiling
    Wang, Zhenning
    Zheng, Long
    Chen, Quan
    Guo, Minyi
    PARALLEL COMPUTING, 2014, 40 (02) : 107 - 115
  • [15] Integer Programming Based Heterogeneous CPU-GPU Cluster Scheduler for SLURM Resource Manager
    Soner, Seren
    Ozturan, Can
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 418 - 424
  • [16] KubeSC-RTP: Smart scheduler for Kubernetes platform on CPU-GPU heterogeneous systems
    Harichane, Ishak
    Makhlouf, Sid Ahmed
    Belalem, Ghalem
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (21)
  • [17] The Scheduling Based on Machine Learning for Heterogeneous CPU/GPU Systems
    Shulga, D. A.
    Kapustin, A. A.
    Kozlov, A. A.
    Kozyrev, A. A.
    Rovnyagin, M. M.
    PROCEEDINGS OF THE 2016 IEEE NORTH WEST RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (ELCONRUSNW), 2016, : 345 - 348
  • [18] A hybrid CPU/GPU approach for optimizing sorting throughput
    Gowanlock, Michael
    Karsin, Ben
    PARALLEL COMPUTING, 2019, 85 : 45 - 55
  • [19] Energy-Efficient Resource Management for Federated Edge Learning With CPU-GPU Heterogeneous Computing
    Zeng, Qunsong
    Du, Yuqing
    Huang, Kaibin
    Leung, Kin K.
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (12) : 7947 - 7962
  • [20] A User-Centric CPU-GPU Governing Framework for 3-D Mobile Games
    Chen, Wei-Ming
    Cheng, Sheng-Wei
    Hsiu, Pi-Cheng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (05) : 961 - 974