ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels

被引：2

作者：

Fang, Jianbin ^{[1
]}

Varbanescu, Ana Lucia ^{[1
]}

Shen, Jie ^{[1
]}

Sips, Henk ^{[1
]}

机构：

[1] Delft Univ Technol, Parallel & Distributed Syst Grp, Delft, Netherlands

来源：

PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING | 2013年

关键词：

Local Memory; API; OpenCL; GPUs;

D O I：

10.1109/PDP.2013.61

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent parallel architectures are equipped with local memory, which simplifies hardware design at the cost of increased program complexity due to explicit management. To simplify this extra-burden that programmers have, we introduce an easy-to-use API, ELMO 1, that improves productivity while preserving high performance of local memory operations. Specifically, ELMO is a generic API that covers different local memory use-cases. We also present prototype implementations for these APIs and perform multiple GPU-inspired optimizations to maximize their performance. Experimental results on the NVIDIA Quadro5000 GPU show that performance is significantly improved by using ELMO on native implementations: the achieved speedup ranges from 1.3x to 3.7x. Furthermore, using ELMO we still achieve performance comparable (if not better) with that of hand-tuned applications, while the code is shorter, clearer, and safer.

引用

页码：375 / 383

页数：9

共 17 条

[1] [Anonymous], 2012, AMD ACCELERATED PARA
[2] [Anonymous], 2002, IJCV
[3] [Anonymous], 2011, APPLICATIONS OF GPU
[4] Baskaran M.M., 2008, PROCEEDINGS OF PPOPP
[5] Bauer M., 2011, PROCEEDINGS OF SC
[6] Chafi H., 2011, PROCEEDINGS OF PPOPP
[7] Intel Inc, 2012, INTEL OPENCL OPTIMIZ
[8] Kandemir M., 2002, PROCEEDINGS OF DAC
[9] Khronos OpenCL Working Group, THE OPENCL SPECIFICA
[10] Moazeni M., 2009, PROCEEDINGS OF SASP

← 1 2 →