Kernel Composition in SYCL

被引:4
作者
Potter, Ralph [1 ]
Keir, Paul [2 ]
Bradford, Russell J. [1 ]
Murray, Alastair [3 ]
机构
[1] Univ Bath, Dept Comp Sci, Bath, Avon, England
[2] Univ West Scotland, Sch Engn & Comp, Paisley, Renfrew, Scotland
[3] Codeplay Software Ltd, Edinburgh, Midlothian, Scotland
来源
INTERNATIONAL WORKSHOP ON OPENCL 2015 | 2015年
关键词
OpenCL; SYCL; GPUs; kernel fusion;
D O I
10.1145/2791321.2791332
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallel primitives libraries reduce the burden of knowledge required for developers to begin developing parallel applications and accelerating them with OpenCL. Unfortunately some current libraries implement primitives as individual kernels and so incur a high performance cost in off-chip memory operations for intermediate variables. We describe a methodology for creating efficient domain specific embedded languages on top of the SYCL for OpenCL standard for parallel programming. Using this approach, a small example language was developed which provides an environment for composing image processing pipelines from a library of more primitive operations, while retaining the capability to generate a single kernel from a complex expression, and so eliminate unnecessary intermediate loads and stores to global memory. This elimination of global memory accesses leads to a 2.75x speedup over implementing an unsharp mask in OpenCLIPP. We give details of our domain specific embedded language, and provide experimental performance measurements of both primitive performance and an unsharp mask operation composed of multiple primitives.
引用
收藏
页数:7
相关论文
共 10 条
  • [1] [Anonymous], 2012, OPENCL SPECIFICATION
  • [2] Dutsch Fabian, 2014, WOSC 14, P41
  • [3] Fousek Jan, 2011, Computer Architecture News, V39, P98, DOI 10.1145/2082156.2082183
  • [4] PACXX: Towards a Unified Programming Model for Programming Accelerators using C++14
    Haidl, Michael
    Gorlatch, Sergei
    [J]. PROCEEDINGS OF LLVM-HPC 14 2014 LLVM COMPILER INFRASTRUCTURE IN HPC, 2014, : 1 - 11
  • [5] Dual-channel Power Controlled Backoff based on Geographic Location in Ad Hoc Networks
    Jiang, You-fu
    Wang, Guang-hui
    Jin, Hai-yang
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 343 - 348
  • [6] Khronos OpenCL Working Group-SYCI subgroup, 2014, SYCL PROV SPEC
  • [7] Kiemele Liam., 2013, 12th Annual International Conference on Aspect-oriented Software Development, AOSD13, P25
  • [8] Niebler Eric., 2007, Symposium on Library-Centric Software Design. LCSD '07, P42, DOI [DOI 10.1145/1512762.1512767, 10.1145/1512762.1512767]
  • [9] Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines
    Ragan-Kelley, Jonathan
    Barnes, Connelly
    Adams, Andrew
    Paris, Sylvain
    Durand, Fredo
    Amarasinghe, Saman
    [J]. ACM SIGPLAN NOTICES, 2013, 48 (06) : 519 - 530
  • [10] Scalable Kernel Fusion for Memory-Bound GPU Applications
    Wahib, Mohamed
    Maruyama, Naoya
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 191 - 202