Hybrid CPU-GPU scheduling and execution of tree traversals

被引:1
作者
Liu, Jianqiao [1 ]
Hegde, Nikhil [1 ]
Kulkarni, Milind [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
关键词
Heterogeneous architectures; Scheduling; Irregular applications; Tree traversal;
D O I
10.1145/2851141.2851174
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPUs offer the promise of massive, power-efficient parallelism. However, exploiting this parallelism requires code to be carefully structured to deal with the limitations of the SIMT execution model. In recent years, there has been much interest in mapping irregular applications to GPUs: applications with unpredictable, data-dependent behaviors. While most of the work in this space has focused on ad hoc implementations of specific algorithms, recent work has looked at generic techniques for mapping a large class of tree traversal algorithms to GPUs, through careful restructuring of the tree traversal algorithms to make them behave more regularly. Unfortunately, even this general approach for GPU execution of tree traversal algorithms is reliant on ad hoc, handwritten, algorithm-specific scheduling (i.e., assignment of threads to warps) to achieve high performance. The key challenge of scheduling is that it is a highly irregular process, that requires the inspection of thread behavior and then careful sorting of the threads into warps. In this paper, we present a novel scheduling and execution technique for tree traversal algorithms that is both general and automatic. The key novelty is a hybrid approach: the GPU partially executes tasks to inspect thread behavior and transmits information back to the CPU, which uses that information to perform the scheduling itself, before executing the remaining, carefully scheduled, portion of the traversals on the GPU. We applied this framework to five tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses hand-tuned, applicationspecific scheduling.
引用
收藏
页码:385 / 386
页数:2
相关论文
共 38 条
  • [1] A user mode CPU-GPU scheduling framework for hybrid workloads
    Wang, Bin
    Ma, Ruhui
    Qi, Zhengwei
    Yao, Jianguo
    Guan, Haibing
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 63 : 25 - 36
  • [2] Scheduling concurrent applications on a cluster of CPU-GPU nodes
    Ravi, Vignesh T.
    Becchi, Michela
    Jiang, Wei
    Agrawal, Gagan
    Chakradhar, Srimat
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (08): : 2262 - 2271
  • [3] A Flexible Scheduling Framework for Heterogeneous CPU-GPU Clusters
    Sajjapongse, Kittisak
    Agarwal, Tejaswi
    Becchi, Michela
    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [4] Prediction Model for Scheduling an Irregular Graph Algorithms on CPU-GPU Hybrid Cluster Framework
    Chandrashekhar, B. N.
    Sanjay, H. A.
    Lakshmi, H.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 584 - 589
  • [5] Lifetime-Driven OpenCL Application Scheduling on CPU-GPU MPSoC
    Cao K.
    Long S.
    Li Z.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (05): : 976 - 991
  • [6] Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System
    Fang, Juan
    Zhang, Jiaxing
    Lu, Shuaibing
    Zhao, Hui
    2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 306 - 311
  • [7] Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems
    Chau, Vincent
    Chu, Xiaowen
    Liu, Hai
    Leung, Yiu-Wing
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON FUTURE ENERGY SYSTEMS (E-ENERGY'17), 2017, : 1 - 11
  • [8] A Simulation Framework for Scheduling Performance Evaluation on CPU-GPU Heterogeneous System
    Vella, Flavio
    Neri, Igor
    Gervasi, Osvaldo
    Tasso, Sergio
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT IV, 2012, 7336 : 457 - 469
  • [9] Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches
    Damschen, Marvin
    Mueller, Frank
    Henkel, Joerg
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2337 - 2347
  • [10] Reducing CPU-GPU Interferences to Improve CPU Performance in Heterogeneous Architectures
    Wen H.
    Zhang W.
    Journal of Computing Science and Engineering, 2020, 16 (04) : 131 - 145