Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous Systems

被引：7

作者：

Wan, Lanjun ^{[1
,2
]}

Zheng, Weihua ^{[3
]}

Yuan, Xinpan ^{[1
]}

机构：

[1] Hunan Univ Technol, Sch Comp Sci, Zhuzhou 412007, Peoples R China

[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China

[3] Hunan Univ Technol, Coll Elect & Informat Engn, Zhuzhou 412007, Peoples R China

来源：

IEEE ACCESS | 2021年 / 9卷

基金：

中国国家自然科学基金;

关键词：

Dynamic scheduling; Task analysis; Processor scheduling; Kernel; Scheduling; Performance evaluation; Graphics processing units; Data-parallel kernels; heterogeneous systems; many-core accelerators; multi-core CPUs; multi-device co-processing; parallel applications; task scheduling; CPU; OPTIMIZATION;

D O I：

10.1109/ACCESS.2021.3073955

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Heterogeneous systems consisting of multiple multi-core CPUs and many-core accelerators have recently come into wide use, and more and more parallel applications are developed in such a heterogeneous system. To fully utilize multiple compute devices to cooperatively and concurrently execute data-parallel kernels on heterogeneous systems, a feedback-based dynamic and elastic task scheduling scheme is proposed, which can provide a better load balance, a greater device utilization, and a lower scheduling overhead by flexibly and dynamically adjusting the workload between devices during execution. The proposed method is more suitable for data-parallel kernels whose computation and data are uniformly distributed, but is less suitable for data-parallel kernels whose computation and data are non-uniformly distributed. Thus, an asynchronous-based dynamic and elastic task scheduling scheme is proposed, which can avoid device underutilization, load imbalance across devices, and frequent kernel launches, inter-device data transfers and inter-device synchronizations by dynamically adjusting the chunk size according to the performance change during runtime. A series of experiments are conducted with 8 representative parallel applications on a hybrid CPU-GPU-MIC system, the results show that the proposed two inter-device task scheduling schemes can achieve the efficient CPU-GPU-MIC co-processing of different parallel applications by effectively partitioning work across devices.

引用

页码：59968 / 59978

页数：11

共 41 条

[31] Adaptive Particle Swarm Optimization with Heterogeneous Multicore Parallelism
Wachowiak, Mark P.
Timson, Mitchell C.
DuVal, David J.
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 2784 - 2793
[32] Efficient CPU-GPU cooperative computing for solving the subset-sum problem
Wan, Lanjun
Li, Kenli
Liu, Jing
Li, Keqin
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02) : 492 - 516
[33] Improving task scheduling with parallelism awareness in heterogeneous computational environments
Wang, Bo
Song, Ying
Cao, Jie
Cui, Xiao
Zhang, Ling
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 94 : 419 - 429
[34] Performance optimizations for scalable CFD applications on hybrid CPU plus MIC heterogeneous computing system with millions of cores
Wang, Yong-Xian
Zhang, Li-Lun
Liu, Wei
Cheng, Xing-Hua
Zhuang, Yu
Chronopoulos, Anthony T.
[J]. COMPUTERS & FLUIDS, 2018, 173 : 226 - 236
[35] CPU plus GPU scheduling with asymptotic profiling
Wang, Zhenning
Zheng, Long
Chen, Quan
Guo, Minyi
[J]. PARALLEL COMPUTING, 2014, 40 (02) : 107 - 115
[36] Wen Y, 2014, INT C HIGH PERFORM
[37] Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2
Xue, Wei
Yang, Chao
Fu, Haohuan
Wang, Xinliang
Xu, Yangtong
Liao, Junfeng
Gan, Lin
Lu, Yutong
Ranjan, Rajiv
Wang, Lizhe
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (08) : 2382 - 2393
[38] Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models
Zhong, Ziming
Rychkov, Vladimir
Lastovetsky, Alexey
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (09) : 2506 - 2518
[39] Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform
Zhou, Shijie
Prasanna, Viktor K.
[J]. 2017 29TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2017, : 137 - 144
[40] A Hardware and Software Task-Scheduling Framework Based on CPU plus FPGA Heterogeneous Architecture in Edge Computing
Zhu, Zongwei
Zhang, Junneng
Zhao, Jinjin
Cao, Jing
Zhao, Duan
Jia, Gangyong
Meng, Qingyong
[J]. IEEE ACCESS, 2019, 7 : 148975 - 148988

← 1 2 3 4 5 →