Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous Systems

被引：7

作者：

Wan, Lanjun ^{[1
,2
]}

Zheng, Weihua ^{[3
]}

Yuan, Xinpan ^{[1
]}

机构：

[1] Hunan Univ Technol, Sch Comp Sci, Zhuzhou 412007, Peoples R China

[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China

[3] Hunan Univ Technol, Coll Elect & Informat Engn, Zhuzhou 412007, Peoples R China

来源：

IEEE ACCESS | 2021年 / 9卷

基金：

中国国家自然科学基金;

关键词：

Dynamic scheduling; Task analysis; Processor scheduling; Kernel; Scheduling; Performance evaluation; Graphics processing units; Data-parallel kernels; heterogeneous systems; many-core accelerators; multi-core CPUs; multi-device co-processing; parallel applications; task scheduling; CPU; OPTIMIZATION;

D O I：

10.1109/ACCESS.2021.3073955

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Heterogeneous systems consisting of multiple multi-core CPUs and many-core accelerators have recently come into wide use, and more and more parallel applications are developed in such a heterogeneous system. To fully utilize multiple compute devices to cooperatively and concurrently execute data-parallel kernels on heterogeneous systems, a feedback-based dynamic and elastic task scheduling scheme is proposed, which can provide a better load balance, a greater device utilization, and a lower scheduling overhead by flexibly and dynamically adjusting the workload between devices during execution. The proposed method is more suitable for data-parallel kernels whose computation and data are uniformly distributed, but is less suitable for data-parallel kernels whose computation and data are non-uniformly distributed. Thus, an asynchronous-based dynamic and elastic task scheduling scheme is proposed, which can avoid device underutilization, load imbalance across devices, and frequent kernel launches, inter-device data transfers and inter-device synchronizations by dynamically adjusting the chunk size according to the performance change during runtime. A series of experiments are conducted with 8 representative parallel applications on a hybrid CPU-GPU-MIC system, the results show that the proposed two inter-device task scheduling schemes can achieve the efficient CPU-GPU-MIC co-processing of different parallel applications by effectively partitioning work across devices.

引用

页码：59968 / 59978

页数：11

共 41 条

[1] A Dynamic Self-Scheduling Scheme for Heterogeneous Multiprocessor Architectures
Belviranli, Mehmet E.
Bhuyan, Laxmi N.
Gupta, Rajiv
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
[2] Combining multi-core and GPU computing for solving combinatorial optimization problems
Chakroun, I.
Melab, N.
Mezmaz, M.
Tuyttens, D.
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (12) : 1563 - 1577
[3] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
[4] Cholesky Factorization on Heterogeneous CPU and GPU Systems
Chen, Jieyang
Chen, Zizhong
[J]. 2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 19 - 26
[5] Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling
Chen, Ray-Bing
Tsai, Yaohung M.
Wang, Weichung
[J]. PARALLEL COMPUTING, 2014, 40 (5-6) : 70 - 85
[6] Chi-Keung Luk, 2009, Proceedings of the 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), P45
[7] DYNAMIC LOAD BALANCING OF PARALLEL COMPUTATIONAL ITERATIVE ROUTINES ON HIGHLY HETEROGENEOUS HPC PLATFORMS
Clarke, David
Lastovetsky, Alexey
Rychkov, Vladimir
[J]. PARALLEL PROCESSING LETTERS, 2011, 21 (02) : 195 - 217
[8] Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches
Damschen, Marvin
Mueller, Frank
Henkel, Joerg
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2337 - 2347
[9] Dongarra J, 2012, Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, P365
[10] Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System
Fang, Juan
Zhang, Jiaxing
Lu, Shuaibing
Zhao, Hui
[J]. 2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 306 - 311

← 1 2 3 4 5 →