Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs

被引：13

作者：

Papakonstantinou, Alexandros ^{[1
]}

Gururaj, Karthik ^{[2
]}

Stratton, John A. ^{[1
]}

Chen, Deming ^{[1
]}

Cong, Jason ^{[2
]}

Hwu, Wen-Mei W. ^{[1
]}

机构：

[1] Univ Illinois, Elect & Comp Engn Dept, Urbana, IL 60680 USA

[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2013年 / 13卷 / 02期

关键词：

Design; Performance; FPGA; high-level synthesis; parallel programming model; high-performance computing; source-to-source compiler; heterogeneous compute systems;

D O I：

10.1145/2514641.2514652

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse-and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SIMT (Single Instruction, Multiple Thread) CUDA code into task-level parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multicore accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.

引用

页数：26

共 50 条

[1] FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs
Papakonstantinou, Alexandros
Gururaj, Karthik
Stratton, John A.
Chen, Deming
Cong, Jason
Hwu, Wen-Mei W.
2009 IEEE 7TH SYMPOSIUM ON APPLICATION SPECIFIC PROCESSORS (SASP 2009), 2009, : 35 - +
[2] High-Performance CUDA Kernel Execution on FPGAs
Papakonstantinou, Alexandros
Gururaj, Karthik
Stratton, John A.
Chen, Deming
Cong, Jason
Hwu, Wen-Mei W.
ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 515 - 516
[3] Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs
Zohouri, Hamid Reza
Maruyama, Naoya
Smith, Aaron
Matsuda, Motohiko
Matsuoka, Satoshi
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 409 - 420
[4] Integrating FPGAs in High-Performance Computing: Introduction
Chow, Paul
Hutton, Mike
FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 131 - 131
[5] Quantum Computing and High-Performance Computing: Compilation Stack Similarities
Alarcon, Sonia Lopez
Elster, Anne
COMPUTING IN SCIENCE & ENGINEERING, 2022, 24 (06) : 66 - 71
[6] Quantum Computing and High-Performance Computing: Compilation Stack Similarities
Lopez Alarcon, Sonia
Elster, Anne C.
Elsters, Anne
Lopez Alarcon, Sonia
Computing in Science and Engineering, 2022, 24 (06): : 66 - 71
[7] CUDA: Scalable parallel programming for high-performance scientific computing
Luebke, David
2008 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, VOLS 1-4, 2008, : 836 - 838
[8] Integrating FPGAs in High-Performance Computing: The Architecture and Implementation Perspective
Woods, Nathan
FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 132 - 132
[9] A Multicore Architecture for High-Performance Scientific Computing using FPGAs
Cobos Carrascosa, J. P.
Aparicio del Moral, B.
Ramos, J. L.
Lopez Jimenez, A. C.
del Toro Iniesta, J. C.
2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 223 - 228
[10] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
Podobas, Artur
Zohouri, Hamid Reza
Maruyama, Naoya
Matsuoka, Satoshi
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,

← 1 2 3 4 5 →