Hybrid CPU-GPU scheduling and execution of tree traversals

被引:1
作者
Liu, Jianqiao [1 ]
Hegde, Nikhil [1 ]
Kulkarni, Milind [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
关键词
Heterogeneous architectures; Scheduling; Irregular applications; Tree traversal;
D O I
10.1145/2851141.2851174
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPUs offer the promise of massive, power-efficient parallelism. However, exploiting this parallelism requires code to be carefully structured to deal with the limitations of the SIMT execution model. In recent years, there has been much interest in mapping irregular applications to GPUs: applications with unpredictable, data-dependent behaviors. While most of the work in this space has focused on ad hoc implementations of specific algorithms, recent work has looked at generic techniques for mapping a large class of tree traversal algorithms to GPUs, through careful restructuring of the tree traversal algorithms to make them behave more regularly. Unfortunately, even this general approach for GPU execution of tree traversal algorithms is reliant on ad hoc, handwritten, algorithm-specific scheduling (i.e., assignment of threads to warps) to achieve high performance. The key challenge of scheduling is that it is a highly irregular process, that requires the inspection of thread behavior and then careful sorting of the threads into warps. In this paper, we present a novel scheduling and execution technique for tree traversal algorithms that is both general and automatic. The key novelty is a hybrid approach: the GPU partially executes tasks to inspect thread behavior and transmits information back to the CPU, which uses that information to perform the scheduling itself, before executing the remaining, carefully scheduled, portion of the traversals on the GPU. We applied this framework to five tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses hand-tuned, applicationspecific scheduling.
引用
收藏
页码:385 / 386
页数:2
相关论文
共 38 条
  • [21] Troodon: A machine-learning based load-balancing application scheduler for CPU-GPU system
    Khalid, Yasir Noman
    Aleem, Muhammad
    Ahmed, Usman
    Islam, Muhammad Arshad
    Lqbal, Muhammad Azhar
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 132 : 79 - 94
  • [22] Quantitative Analysis of CPU/GPU Co-execution in High-Performance Computing Systems
    Kang, SeungGu
    Choi, Hong Jun
    Park, Jae Hyung
    Chung, Sung Woo
    Kim, Jong Myon
    Kwon, DongSeop
    Na, Joong Chae
    Kim, Cheol Hong
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (07): : 2923 - 2936
  • [23] dStream: An Online-Based Dynamic Operator-Level Query Mapping Scheme on Discrete CPU-GPU Architectures
    Jung, Gyeonghwan
    Jeong, Yeonwoo
    Park, Kyuli
    Lee, Dongjae
    Byun, Hongsu
    Lee, Suyeon
    Park, Sungyong
    IEEE ACCESS, 2025, 13 : 8239 - 8256
  • [24] Solving optimization problems using a hybrid systolic search on GPU plus CPU
    Vidal, Pablo
    Alba, Enrique
    Luna, Francisco
    SOFT COMPUTING, 2017, 21 (12) : 3227 - 3245
  • [25] Solving optimization problems using a hybrid systolic search on GPU plus CPU
    Pablo Vidal
    Enrique Alba
    Francisco Luna
    Soft Computing, 2017, 21 : 3227 - 3245
  • [26] Evaluating application performance and energy consumption on hybrid CPU plus GPU architecture
    Padoin, Edson Luiz
    Pilla, Laercio Lima
    Boito, Francieli Zanon
    Kassick, Rodrigo Virote
    Velho, Pedro
    Navaux, Philippe O. A.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2013, 16 (03): : 511 - 525
  • [27] Evaluating application performance and energy consumption on hybrid CPU+GPU architecture
    Edson Luiz Padoin
    Laércio Lima Pilla
    Francieli Zanon Boito
    Rodrigo Virote Kassick
    Pedro Velho
    Philippe O. A. Navaux
    Cluster Computing, 2013, 16 : 511 - 525
  • [28] TB-TBP: a task-based adaptive routing algorithm for network-on-chip in heterogenous CPU-GPU architectures
    Fang, Juan
    Wei, Zhichao
    Liu, Yaqi
    Hou, Yumin
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (05) : 6311 - 6335
  • [29] TB-TBP: a task-based adaptive routing algorithm for network-on-chip in heterogenous CPU-GPU architectures
    Juan Fang
    Zhichao Wei
    Yaqi Liu
    Yumin Hou
    The Journal of Supercomputing, 2024, 80 : 6311 - 6335
  • [30] Scheduling Challenges and Opportunities in Integrated CPU plus GPU Processors (Invited Special Session Paper)
    Dev, Kapil
    Reda, Sherief
    14TH ACM/IEEE SYMPOSIUM ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA (ESTIMEDIA 2016), 2016, : 78 - 83