pocl: A Performance-Portable OpenCL Implementation

被引:64
|
作者
Jaaskelainen, Pekka [1 ]
Sanchez de La Lama, Carlos [2 ]
Schnetter, Erik [3 ,4 ,5 ]
Raiskila, Kalle [6 ]
Takala, Jarmo [1 ]
Berg, Heikki [6 ]
机构
[1] Tampere Univ Technol, FIN-33101 Tampere, Finland
[2] Knowledge Dev POF, Madrid, Spain
[3] Perimeter Inst Theoret Phys, Waterloo, ON, Canada
[4] Univ Guelph, Dept Phys, Guelph, ON N1G 2W1, Canada
[5] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
[6] Nokia Res Ctr, Espoo, Finland
基金
美国国家科学基金会; 加拿大自然科学与工程研究理事会; 芬兰科学院;
关键词
OpenCL; LLVM; GPGPU; VLIW; SIMD; Parallel programming; Heterogeneous platforms; Performance portability;
D O I
10.1007/s10766-014-0320-y
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. We test the two aspects to portability by utilizing the kernel compiler and the OpenCL implementation to run OpenCL applications in various platforms with different style of parallel resources. The results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.
引用
收藏
页码:752 / 785
页数:34
相关论文
共 50 条
  • [31] Optimized implementation of OpenCL kernels on FPGAs
    Shata, Kholoud
    Elteir, Marwa K.
    EL-Zoghabi, Adel A.
    JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 : 491 - 505
  • [32] An OpenCL Implementation of Pinhole Image Reconstruction
    Dimmock, Matthew R.
    Nikulin, Dmitri A.
    Gillam, John E.
    Nguyen, Chuong V.
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2012, 59 (04) : 1738 - 1749
  • [33] AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management
    Leppanen, Topi
    Multanen, Joonas
    Leppanen, Leevi
    Jaaskelainen, Pekka
    2023 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE, NORCAS, 2023,
  • [34] Towards Transparently Tackling Functionality and Performance Issues Across Different OpenCL Platforms
    Agosta, Giovanni
    Barenghi, Alessandro
    Pelosi, Gerardo
    Scandale, Michele
    2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2014, : 130 - 136
  • [35] OpenCL implementation of a high performance 3D Peridynamic model on graphics accelerators
    Mossaiby, F.
    Shojaei, A.
    Zaccariotto, M.
    Galvanetto, U.
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2017, 74 (08) : 1856 - 1870
  • [36] Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems
    Grewe, Dominik
    Wang, Zheng
    O'Boyle, Michael F. P.
    PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 161 - 170
  • [37] OpenCL Superpixel Implementation on a General Purpose Multi-core CPU
    Haseljic, Hana
    Cogo, Emir
    Prazina, Irfan
    Turcinhodzic, Razija
    Buza, Emir
    Akagic, Amila
    2018 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST), 2018, : 197 - 202
  • [38] Improving Performance of GPU Specific OpenCL Program on CPUs
    Lan, Qiang
    Xun, Changqing
    Wen, Mei
    Su, Huayou
    Liu, Lifang
    Zhang, Chunyuan
    2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 356 - 360
  • [39] Parallel Bayesian ARTMAP and Its OpenCL Implementation
    István Lőrentz
    Răzvan Andonie
    Lucian M. Sasu
    Neural Processing Letters, 2018, 47 : 491 - 507
  • [40] Parallel Bayesian ARTMAP and Its OpenCL Implementation
    Lorentz, Istvan
    Andonie, Razvan
    Sasu, Lucian M.
    NEURAL PROCESSING LETTERS, 2018, 47 (02) : 491 - 507