NoT: a high-level no-threading parallel programming method for heterogeneous systems

被引：0

作者：

Shusen Wu

Xiaoshe Dong

Xingjun Zhang

Zhengdong Zhu

机构：

[1] Xi’an Jiaotong University,School of Electronic and Information Engineering

来源：

The Journal of Supercomputing | 2019年 / 75卷

关键词：

High-level parallel programming; Language construct; Association structure; Heterogeneous system; OpenCL;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Multithreading is the core of mainstream heterogeneous programming methods such as CUDA and OpenCL. However, multithreaded parallel programming requires programmers to handle low-level runtime details, making the programming process complex and error prone. This paper presents no-threading (NoT), a high-level no-threading programming method. It introduces the association structure, a new language construct, to provide a declarative runtime-free expression of different data parallelisms and avoid the use of multithreading. The NoT method designs C-like syntax for the association structure and implements a compiler and runtime system using OpenCL as an intermediate language. We demonstrate the effectiveness of our techniques with multiple benchmarks. The size of the NoT code is comparable to that of the serial code and is far less than that of the benchmark OpenCL code. The compiler generates efficient OpenCL code, yielding a performance competitive with or equivalent to that of the manually optimized benchmark OpenCL code on both a GPU platform and an MIC platform.

引用

页码：3810 / 3841

页数：31

共 61 条

[1] Han TD(2011)hiCUDA: high-level GPGPU programming IEEE Trans Parallel Distrib Syst 22 78-90
[2] Abdelrahman TS(2015)Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems ACM Trans Archit Code Optim 11 1-26
[3] Wang Z(2007)Parallel programmability and the Chapel language Int J High Perform Comput Appl 21 291-312
[4] Grewe D(2008)MapReduce: simplified data processing on large clusters Commun ACM 51 107-113
[5] O’boyle MFP(2011)Copperhead: compiling an embedded data parallel language ACM SIGPLAN Not 46 47-56
[6] Chamberlain BL(2013)Trellis: portability across architectures with a high-level framework J Parallel Distrib Comput 73 1400-1413
[7] Callahan D(2014)Kokkos: enabling manycore performance portability through polymorphic memory access patterns J Parallel Distrib Comput 74 3202-3216
[8] Zima HP(2012)PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation Parallel Comput 38 157-174
[9] Dean J(2004)Brook for GPUs: stream computing on graphics hardware ACM Trans Graph 23 777-786
[10] Ghemawat S(2011)Sponge: portable stream programming on graphics engines ACM SIGPLAN Not 46 381-392

← 1 2 3 4 5 6 7 →