Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

被引:9
作者
Nozal, Raul [1 ]
Perez, Borja [1 ]
Luis Bosque, Jose [1 ]
Beivide, Ramon [1 ]
机构
[1] Univ Cantabria, Comp Sci & Elect Dept, Santander, Spain
基金
欧洲研究理事会; 欧盟地平线“2020”;
关键词
Heterogeneous computing; Co-execution CPU-Xeon Phi; Load balancing; OpenCL; Performance portability; Energy efficiency;
D O I
10.1007/s11227-018-2318-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous systems composed by a CPU and a set of different hardware accelerators are very compelling thanks to their excellent performance and energy consumption features. One of the most important problems of those systems is the workload distribution among their devices. This paper describes an extension of the Maat library to allow the co-execution of a data-parallel OpenCL kernel on a heterogeneous system composed by a CPU and an Intel Xeon Phi. Maat provides an abstract view of the heterogeneous system as well as set of load balancing algorithms to squeeze the performance out of the node. It automatically performs the data partition and distribution among the devices, generates the kernels and efficiently merges the partial outputs together. Experimental results show that this approach always outperforms the baseline with only a Xeon Phi, giving excellent performance and energy efficiency. Furthermore, it is essential to select the right load balancing algorithm because it has a huge impact in the system performance and energy consumption.
引用
收藏
页码:1123 / 1136
页数:14
相关论文
共 18 条
  • [1] MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL
    Aji, Ashwin M.
    Pena, Antonio J.
    Balaji, Pavan
    Feng, Wu-chun
    [J]. PARALLEL COMPUTING, 2016, 58 : 37 - 55
  • [2] [Anonymous], P 9 WORKSH GEN PURP
  • [3] A Dynamic Self-Scheduling Scheme for Heterogeneous Multiprocessor Architectures
    Belviranli, Mehmet E.
    Bhuyan, Laxmi N.
    Gupta, Rajiv
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [4] Financial applications on multi-CPU and multi-GPU architectures
    Castillo, Emilio
    Camarero, Cristobal
    Borrego, Ana
    Luis Bosque, Jose
    [J]. JOURNAL OF SUPERCOMPUTING, 2015, 71 (02) : 729 - 739
  • [5] Donyanavard B., 2016, Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis - CODES '16, P1
  • [6] Kai Ma, 2012, 2012 41st International Conference on Parallel Processing (ICPP 2012), P48, DOI 10.1109/ICPP.2012.31
  • [7] Model-Based Optimization of EULAG Kernel on Intel Xeon Phi Through Load Imbalancing
    Lastovetsky, Alexey
    Szustak, Lukasz
    Wyrzykowski, Roman
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (03) : 787 - 797
  • [8] SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration
    Lee, Janghaeng
    Samadi, Mehrzad
    Park, Yongjun
    Mahlke, Scott
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2015, 33 (03):
  • [9] Automatic OpenCL code generation for multi-device heterogeneous architectures
    Li, Pei
    Brunet, Elisabeth
    Trahay, Francois
    Parrot, Christian
    Thomas, Gael
    Namyst, Raymond
    [J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 959 - 968
  • [10] Lopez MG, 2016, PROCEEDINGS OF WACCPD 2016: THIRD WORKSHOP ON ACCELERATOR PROGRAMMING USING DIRECTIVES, P13, DOI [10.1109/WACCPD.2016.9, 10.1109/WACCPD.2016.006]