Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous Systems

被引:7
作者
Wan, Lanjun [1 ,2 ]
Zheng, Weihua [3 ]
Yuan, Xinpan [1 ]
机构
[1] Hunan Univ Technol, Sch Comp Sci, Zhuzhou 412007, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[3] Hunan Univ Technol, Coll Elect & Informat Engn, Zhuzhou 412007, Peoples R China
来源
IEEE ACCESS | 2021年 / 9卷
基金
中国国家自然科学基金;
关键词
Dynamic scheduling; Task analysis; Processor scheduling; Kernel; Scheduling; Performance evaluation; Graphics processing units; Data-parallel kernels; heterogeneous systems; many-core accelerators; multi-core CPUs; multi-device co-processing; parallel applications; task scheduling; CPU; OPTIMIZATION;
D O I
10.1109/ACCESS.2021.3073955
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous systems consisting of multiple multi-core CPUs and many-core accelerators have recently come into wide use, and more and more parallel applications are developed in such a heterogeneous system. To fully utilize multiple compute devices to cooperatively and concurrently execute data-parallel kernels on heterogeneous systems, a feedback-based dynamic and elastic task scheduling scheme is proposed, which can provide a better load balance, a greater device utilization, and a lower scheduling overhead by flexibly and dynamically adjusting the workload between devices during execution. The proposed method is more suitable for data-parallel kernels whose computation and data are uniformly distributed, but is less suitable for data-parallel kernels whose computation and data are non-uniformly distributed. Thus, an asynchronous-based dynamic and elastic task scheduling scheme is proposed, which can avoid device underutilization, load imbalance across devices, and frequent kernel launches, inter-device data transfers and inter-device synchronizations by dynamically adjusting the chunk size according to the performance change during runtime. A series of experiments are conducted with 8 representative parallel applications on a hybrid CPU-GPU-MIC system, the results show that the proposed two inter-device task scheduling schemes can achieve the efficient CPU-GPU-MIC co-processing of different parallel applications by effectively partitioning work across devices.
引用
收藏
页码:59968 / 59978
页数:11
相关论文
共 41 条
  • [1] A Dynamic Self-Scheduling Scheme for Heterogeneous Multiprocessor Architectures
    Belviranli, Mehmet E.
    Bhuyan, Laxmi N.
    Gupta, Rajiv
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [2] Combining multi-core and GPU computing for solving combinatorial optimization problems
    Chakroun, I.
    Melab, N.
    Mezmaz, M.
    Tuyttens, D.
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (12) : 1563 - 1577
  • [3] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
  • [4] Cholesky Factorization on Heterogeneous CPU and GPU Systems
    Chen, Jieyang
    Chen, Zizhong
    [J]. 2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 19 - 26
  • [5] Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling
    Chen, Ray-Bing
    Tsai, Yaohung M.
    Wang, Weichung
    [J]. PARALLEL COMPUTING, 2014, 40 (5-6) : 70 - 85
  • [6] Chi-Keung Luk, 2009, Proceedings of the 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), P45
  • [7] DYNAMIC LOAD BALANCING OF PARALLEL COMPUTATIONAL ITERATIVE ROUTINES ON HIGHLY HETEROGENEOUS HPC PLATFORMS
    Clarke, David
    Lastovetsky, Alexey
    Rychkov, Vladimir
    [J]. PARALLEL PROCESSING LETTERS, 2011, 21 (02) : 195 - 217
  • [8] Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches
    Damschen, Marvin
    Mueller, Frank
    Henkel, Joerg
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2337 - 2347
  • [9] Dongarra J, 2012, Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, P365
  • [10] Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System
    Fang, Juan
    Zhang, Jiaxing
    Lu, Shuaibing
    Zhao, Hui
    [J]. 2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 306 - 311