PPOpenCL: A Performance-Portable OpenCL Compiler with Host and Kernel Thread Code Fusion

被引:57
|
作者
Liu, Ying [1 ]
Huang, Lei [1 ]
Wu, Mingchuan [2 ,3 ]
Cui, Huimin [2 ,3 ]
Lv, Fang [1 ]
Feng, Xiaobing [2 ,3 ]
Xue, Jingling [4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Chinese Acad Sci, ICT, SKL Comp Architecture, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
[4] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
来源
PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '19) | 2019年
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Heterogeneous computing; Compiler; OpenCL;
D O I
10.1145/3302516.3307350
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
OpenCL offers code portability but no performance portability. Given an OpenCL program X specifically written for one platform P, existing OpenCL compilers, which usually optimize its host and kernel codes individually, often yield poor performance for another platform Q. Instead of obtaining a performance-improved version of X for Q via manual tuning, we aim to achieve this automatically by a source-to-source OpenCL compiler framework, PPOpenCL. By fusing X's host and kernel thread codes (with the operations in different work-items in the same work-group represented explicitly), we are able to apply data flow analyses, and subsequently, performance-enhancing optimizations on a fused control flow graph specifically for platform Q. Validation against OpenCL benchmarks shows that PPOpenCL (implemented in Clang 3.9.1) can achieve significantly improved portable performance on seven platforms considered.
引用
收藏
页码:2 / 16
页数:15
相关论文
共 11 条
  • [1] pocl: A Performance-Portable OpenCL Implementation
    Jaaskelainen, Pekka
    Sanchez de La Lama, Carlos
    Schnetter, Erik
    Raiskila, Kalle
    Takala, Jarmo
    Berg, Heikki
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2015, 43 (05) : 752 - 785
  • [2] pocl: A Performance-Portable OpenCL Implementation
    Pekka Jääskeläinen
    Carlos Sánchez de La Lama
    Erik Schnetter
    Kalle Raiskila
    Jarmo Takala
    Heikki Berg
    International Journal of Parallel Programming, 2015, 43 : 752 - 785
  • [3] Developing Performance-Portable Molecular Dynamics Kernels in OpenCL
    Pennycook, S. J.
    Jarvis, S. A.
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 386 - 395
  • [4] Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks
    Tsai, Yaohung M.
    Luszczek, Piotr
    Kurzak, Jakub
    Dongarra, Jack
    PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 9 - 18
  • [5] From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming
    Du, Peng
    Weber, Rick
    Luszczek, Piotr
    Tomov, Stanimire
    Peterson, Gregory
    Dongarra, Jack
    PARALLEL COMPUTING, 2012, 38 (08) : 391 - 407
  • [6] I DEFIX: A versatile performance-portable Godunov code for astrophysical flows
    Lesur, G.R.J.
    Baghdadi, S.
    Wafflard-Fernandez, G.
    Mauxion, J.
    Robert, C.M.T.
    Van Den Bossche, M.
    Astronomy and Astrophysics, 2023, 677
  • [7] Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
    Rasch, Ari
    Schulze, Richard
    Gorlatch, Sergei
    PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON OPENCL (IWOCL'19), 2019,
  • [8] Generating Performance Portable Code using Rewrite Rules From High-Level Functional Expressions to High-Performance OpenCL Code
    Steuwer, Michel
    Fensch, Christian
    Lindley, Sam
    Dubach, Christophe
    PROCEEDINGS OF THE 20TH ACM SIGPLAN INTERNATIONAL CONFERENCE ON FUNCTIONAL PROGRAMMING (ICFP'15), 2015, : 205 - 217
  • [9] FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance
    Yasir Noman Khalid
    Muhammad Aleem
    Usman Ahmed
    Radu Prodan
    Muhammad Arshad Islam
    Muhammad Azhar Iqbal
    Computing, 2021, 103 : 2171 - 2202
  • [10] Generating Performance Portable Code using Rewrite Rules From High-Level Functional Expressions to High-Performance OpenCL Code
    Steuwer, Michel
    Fensch, Christian
    Lindley, Sam
    Dubach, Christophe
    ACM SIGPLAN NOTICES, 2015, 50 (09) : 205 - 217