Support OpenCL 2.0 Compiler on LLVM for PTX Simulators

被引：2

作者：

Yang, Chun-Chieh ^{[1
]}

Wang, Shao-Chung ^{[1
]}

Hsu, Min-Yi ^{[1
]}

Chang, Yuan-Ming ^{[1
]}

Hwang, Yuan-Shin ^{[2
]}

Lee, Jenq-Kuen ^{[1
]}

机构：

[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan

[2] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2019年 / 91卷 / 3-4期

关键词：

OpenCL; Gem5-gpu; LLVM; Libclc; PTX;

D O I：

10.1007/s11265-018-1377-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed across heterogeneous devices. Compared with OpenCL 1.2, the new features of OpenCL 2.0 provide developers with better expressive power for programming heterogeneous computing environments. Currently, gem5-gpu, which includes gem5 and GPGPU-Sim, can offer an experimental simulation environment for OpenCL. In gem5-gpu, gem5 only supports CUDA, although GPGPU-Sim can support OpenCL by compiling an OpenCL kernel code to PTX code using real GPU drivers. However, this compilation flow in GPGPU-Sim can only support up to OpenCL 1.2. OpenCL 2.0 provides new features such as workgroup built-in functions, extended atomic built-in functions, and device-side enqueue. To support OpenCL 2.0, the compiler must be extended to enable the compilation of OpenCL 2.0 kernel code to PTX code. In this paper, the proposed compiler is modified from the low level virtual machine (LLVM) compiler to extend such features to enhance the emulator to support OpenCL 2.0. The proposed compiler creates local buffers for each workgroup to enable workgroup built-in functions and adds atomic built-in functions with memory order and memory scope for OpenCL 2.0 in NVPTX. Furthermore, the APIs available in CUDA are utilized to implement the OpenCL 2.0 device-side enqueue kernel and compilation schemes in Clang are revised. The AMD APP SDK 3.0 and NTU OpenCL benchmarks are used to verify that the proposed compiler can support the features of OpenCL 2.0.

引用

页码：261 / 271

页数：11

共 8 条

[1]

[Anonymous], ILVM INSTRUCTION SET

[2]

Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648

[3]

Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718

[4]

Lattner C, 2004, INT SYM CODE GENER, P75, DOI 10.1109/CGO.2004.1281665

[5] gem5-gpu: A Heterogeneous CPU-GPU Simulator [J].

Power, Jason ;

Hestness, Joel ;

Orr, Marc S. ;

Hill, Mark D. ;

Wood, David A. .

IEEE COMPUTER ARCHITECTURE LETTERS, 2015, 14 (01) :34-36

[6]

Sharlet D., 2012, PROC GEN M LLVM DEV

[7]

Wang L., 2017, ISPASS 2017 POSTER

[8] OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators [J].

Yang, Chun-Chieh ;

Wang, Shao-Chung ;

Hsu, Min-Yi ;

Chang, Yuan-Ming ;

Hwang, Yuan-Shin ;

Lee, Jenq-Kuen .

2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW), 2017, :53-58

← 1 →