Towards Transparently Tackling Functionality and Performance Issues Across Different OpenCL Platforms

被引:10
|
作者
Agosta, Giovanni [1 ]
Barenghi, Alessandro [1 ]
Pelosi, Gerardo [1 ]
Scandale, Michele [1 ]
机构
[1] Politecn Milan, DEIB, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy
来源
2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) | 2014年
关键词
Parallel Architectures; Parallel Programming; OpenCL; GPGPU;
D O I
10.1109/CANDAR.2014.53
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
OpenCL applications may present tight constraints on work-group size due to algorithm design or chosen implementation strategy. This may hamper functional or performance portability across different platforms, due to lack of resources. The current solution is to re-design the implementation, optimizing it for the new platform. However, this can become a showstopper for new platforms, for which a large manual optimization effort is needed to port benchmark suites and applications. In this work, we aim at tackling such issues by applying work-item coalescing techniques to optimize the mapping of the work-items to the processing elements. However, this is generally not sufficient to achieve good performance as different design patterns may be applied to exploit the specific features of the target architecture. We show how additional target specific transformations can improve the performance with respect to the work-items coalescing baseline. We employ a Matrix Multiply case study to show how the work-item coalescing transformations can impact functional portability, together with providing an opportunity of automatically inserting the use of asynchronous copies on embedded many-core platforms endowed with such a feature.
引用
收藏
页码:130 / 136
页数:7
相关论文
共 1 条
  • [1] Performance Portability Evaluation of OpenCL Benchmarks across Intel and NVIDIA Platforms
    Bertoni, Colleen
    Kwack, JaeHyuk
    Applencourt, Thomas
    Ghadar, Yasarnan
    Honierding, Brian
    Knight, Christopher
    Videau, Brice
    Zheng, Huihuo
    Morozov, Vitali
    Parker, Scott
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 330 - 339