Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

被引:5
作者
Perez, B. [1 ]
Stafford, E. [1 ]
Bosque, J. L. [1 ]
Beivide, R. [1 ]
Mateo, S. [2 ]
Teruel, X. [2 ]
Martorell, X. [2 ]
Ayguade, E. [2 ]
机构
[1] Univ Cantabria, Dept Comp Sci & Elect, Santander, Spain
[2] Univ Politecn Cataluna, Barcelona Supercomp Ctr, Barcelona, Spain
基金
欧洲研究理事会; 欧盟地平线“2020”;
关键词
Heterogeneous systems; OmpSs programming model; OpenCL; Co-execution; MULTI-CPU;
D O I
10.1016/j.jpdc.2018.11.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices, However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 6 条
  • [1] Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems
    Perez, B.
    Stafford, E.
    Bosque, J. L.
    Beivide, R.
    Mateo, S.
    Teruel, X.
    Martorell, X.
    Ayguade, E.
    2017 29TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2017, : 1 - 8
  • [2] Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems
    Perez, Borja
    Stafford, E.
    Bosque, J. L.
    Beivide, R.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 30 - 42
  • [3] Hardware support for balanced co-execution in heterogeneous processors
    Perez, Borja
    Luis Bosque, Jose
    PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024, 2024, : 106 - 114
  • [4] Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels
    Raúl Nozal
    Borja Perez
    Jose Luis Bosque
    Ramón Beivide
    The Journal of Supercomputing, 2019, 75 : 1123 - 1136
  • [5] Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels
    Nozal, Raul
    Perez, Borja
    Luis Bosque, Jose
    Beivide, Ramon
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (03) : 1123 - 1136
  • [6] Fuzzy Active Learning to Detect OpenCL Kernel Heterogeneous Machines in Cyber Physical Systems
    Ahmed, Usman
    Lin, Jerry Chun-Wei
    Srivastava, Gautam
    Mekala, M. S.
    Jung, Ho-Youl
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (11) : 4618 - 4629