OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators

被引：5

作者：

Yang, Chun-Chieh ^{[1
]}

Wang, Shao-Chung ^{[1
]}

Hsu, Min-Yi ^{[1
]}

Chang, Yuan-Ming ^{[1
]}

Hwang, Yuan-Shin ^{[2
]}

Lee, Jenq-Kuen ^{[1
]}

机构：

[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan

[2] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan

来源：

2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW) | 2017年

关键词：

LLVM; OpenCL; PTX; Libclc; GPGPU-Sim;

D O I：

10.1109/ICPPW.2017.21

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

OpenCL continues to gather momentum on both desktop and mobile devices. The new features of OpenCL 2.0 provides developers better expressive power in programming heterogeneous computing environments. Currently in the experimental simulation environment, gem5-gpu only supports CUDA, but GPGPU-Sim can support OpenCL by compiling OpenCL kernel code to PTX using real GPU driver. However, this driver compilation in GPGPU-Sim only can support up to OpenCL 1.2. To support OpenCL 2.0, it is necessary to extend the compiler to enable the compilation of OpenCL 2.0 kernel code to PTX. In this paper, our experience in enabling the compiler flow is reported. In OpenCL 2.0, it provides new features such as dynamic parallelism, work-group built-in functions, extend atomic built-in functions, and so on. The proposed compiler that is modified from Low Level Virtual Machine (LLVM) extends such features for enhancing the emulator to support OpenCL 2.0. After the compiler is modified, it can support dynamic parallelism, work-group built-in functions and extend atomic built-in functions. Using existing dynamic parallelism APIs in CUDA to implement OpenCL 2.0 enqueue kernel and revise compilation scheme in clang. Furthermore, the proposed compiler also creates local buffers for each work group to use for work-group built-in functions, and adds atomic built-in functions with memory order and memory scope for OpenCL 2.0 in NVPTX. From benchmarks, the proposed compiler can support the claim target.

引用

页码：53 / 58

页数：6

共 7 条

[1]

Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648

[2]

Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718

[3] LLVM: A compilation framework for lifelong program analysis & transformation [J].

Lattner, C ;

Adve, V .

CGO 2004: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2004, :75-86

[4]

Lattner Chris., 2002, The LLVM Instruction Set and Compilation Strategy

[5] gem5-gpu: A Heterogeneous CPU-GPU Simulator [J].

Power, Jason ;

Hestness, Joel ;

Orr, Marc S. ;

Hill, Mark D. ;

Wood, David A. .

IEEE COMPUTER ARCHITECTURE LETTERS, 2015, 14 (01) :34-36

[6]

Sharlet D., 2012, PROC GEN M LLVM DEV

[7]

Wang Li, 2017, ISPASS 2017 IN PRESS

← 1 →