ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels

被引:2
作者
Fang, Jianbin [1 ]
Varbanescu, Ana Lucia [1 ]
Shen, Jie [1 ]
Sips, Henk [1 ]
机构
[1] Delft Univ Technol, Parallel & Distributed Syst Grp, Delft, Netherlands
来源
PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING | 2013年
关键词
Local Memory; API; OpenCL; GPUs;
D O I
10.1109/PDP.2013.61
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent parallel architectures are equipped with local memory, which simplifies hardware design at the cost of increased program complexity due to explicit management. To simplify this extra-burden that programmers have, we introduce an easy-to-use API, ELMO 1, that improves productivity while preserving high performance of local memory operations. Specifically, ELMO is a generic API that covers different local memory use-cases. We also present prototype implementations for these APIs and perform multiple GPU-inspired optimizations to maximize their performance. Experimental results on the NVIDIA Quadro5000 GPU show that performance is significantly improved by using ELMO on native implementations: the achieved speedup ranges from 1.3x to 3.7x. Furthermore, using ELMO we still achieve performance comparable (if not better) with that of hand-tuned applications, while the code is shorter, clearer, and safer.
引用
收藏
页码:375 / 383
页数:9
相关论文
共 17 条
  • [1] [Anonymous], 2012, AMD ACCELERATED PARA
  • [2] [Anonymous], 2002, IJCV
  • [3] [Anonymous], 2011, APPLICATIONS OF GPU
  • [4] Baskaran M.M., 2008, PROCEEDINGS OF PPOPP
  • [5] Bauer M., 2011, PROCEEDINGS OF SC
  • [6] Chafi H., 2011, PROCEEDINGS OF PPOPP
  • [7] Intel Inc, 2012, INTEL OPENCL OPTIMIZ
  • [8] Kandemir M., 2002, PROCEEDINGS OF DAC
  • [9] Khronos OpenCL Working Group, THE OPENCL SPECIFICA
  • [10] Moazeni M., 2009, PROCEEDINGS OF SASP