Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous Systems

被引:7
作者
Wan, Lanjun [1 ,2 ]
Zheng, Weihua [3 ]
Yuan, Xinpan [1 ]
机构
[1] Hunan Univ Technol, Sch Comp Sci, Zhuzhou 412007, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[3] Hunan Univ Technol, Coll Elect & Informat Engn, Zhuzhou 412007, Peoples R China
来源
IEEE ACCESS | 2021年 / 9卷
基金
中国国家自然科学基金;
关键词
Dynamic scheduling; Task analysis; Processor scheduling; Kernel; Scheduling; Performance evaluation; Graphics processing units; Data-parallel kernels; heterogeneous systems; many-core accelerators; multi-core CPUs; multi-device co-processing; parallel applications; task scheduling; CPU; OPTIMIZATION;
D O I
10.1109/ACCESS.2021.3073955
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous systems consisting of multiple multi-core CPUs and many-core accelerators have recently come into wide use, and more and more parallel applications are developed in such a heterogeneous system. To fully utilize multiple compute devices to cooperatively and concurrently execute data-parallel kernels on heterogeneous systems, a feedback-based dynamic and elastic task scheduling scheme is proposed, which can provide a better load balance, a greater device utilization, and a lower scheduling overhead by flexibly and dynamically adjusting the workload between devices during execution. The proposed method is more suitable for data-parallel kernels whose computation and data are uniformly distributed, but is less suitable for data-parallel kernels whose computation and data are non-uniformly distributed. Thus, an asynchronous-based dynamic and elastic task scheduling scheme is proposed, which can avoid device underutilization, load imbalance across devices, and frequent kernel launches, inter-device data transfers and inter-device synchronizations by dynamically adjusting the chunk size according to the performance change during runtime. A series of experiments are conducted with 8 representative parallel applications on a hybrid CPU-GPU-MIC system, the results show that the proposed two inter-device task scheduling schemes can achieve the efficient CPU-GPU-MIC co-processing of different parallel applications by effectively partitioning work across devices.
引用
收藏
页码:59968 / 59978
页数:11
相关论文
共 41 条
  • [31] Adaptive Particle Swarm Optimization with Heterogeneous Multicore Parallelism
    Wachowiak, Mark P.
    Timson, Mitchell C.
    DuVal, David J.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 2784 - 2793
  • [32] Efficient CPU-GPU cooperative computing for solving the subset-sum problem
    Wan, Lanjun
    Li, Kenli
    Liu, Jing
    Li, Keqin
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02) : 492 - 516
  • [33] Improving task scheduling with parallelism awareness in heterogeneous computational environments
    Wang, Bo
    Song, Ying
    Cao, Jie
    Cui, Xiao
    Zhang, Ling
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 94 : 419 - 429
  • [34] Performance optimizations for scalable CFD applications on hybrid CPU plus MIC heterogeneous computing system with millions of cores
    Wang, Yong-Xian
    Zhang, Li-Lun
    Liu, Wei
    Cheng, Xing-Hua
    Zhuang, Yu
    Chronopoulos, Anthony T.
    [J]. COMPUTERS & FLUIDS, 2018, 173 : 226 - 236
  • [35] CPU plus GPU scheduling with asymptotic profiling
    Wang, Zhenning
    Zheng, Long
    Chen, Quan
    Guo, Minyi
    [J]. PARALLEL COMPUTING, 2014, 40 (02) : 107 - 115
  • [36] Wen Y, 2014, INT C HIGH PERFORM
  • [37] Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2
    Xue, Wei
    Yang, Chao
    Fu, Haohuan
    Wang, Xinliang
    Xu, Yangtong
    Liao, Junfeng
    Gan, Lin
    Lu, Yutong
    Ranjan, Rajiv
    Wang, Lizhe
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (08) : 2382 - 2393
  • [38] Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models
    Zhong, Ziming
    Rychkov, Vladimir
    Lastovetsky, Alexey
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (09) : 2506 - 2518
  • [39] Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform
    Zhou, Shijie
    Prasanna, Viktor K.
    [J]. 2017 29TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2017, : 137 - 144
  • [40] A Hardware and Software Task-Scheduling Framework Based on CPU plus FPGA Heterogeneous Architecture in Edge Computing
    Zhu, Zongwei
    Zhang, Junneng
    Zhao, Jinjin
    Cao, Jing
    Zhao, Duan
    Jia, Gangyong
    Meng, Qingyong
    [J]. IEEE ACCESS, 2019, 7 : 148975 - 148988