Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs

被引:13
|
作者
Papakonstantinou, Alexandros [1 ]
Gururaj, Karthik [2 ]
Stratton, John A. [1 ]
Chen, Deming [1 ]
Cong, Jason [2 ]
Hwu, Wen-Mei W. [1 ]
机构
[1] Univ Illinois, Elect & Comp Engn Dept, Urbana, IL 60680 USA
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
关键词
Design; Performance; FPGA; high-level synthesis; parallel programming model; high-performance computing; source-to-source compiler; heterogeneous compute systems;
D O I
10.1145/2514641.2514652
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse-and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SIMT (Single Instruction, Multiple Thread) CUDA code into task-level parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multicore accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs
    Papakonstantinou, Alexandros
    Gururaj, Karthik
    Stratton, John A.
    Chen, Deming
    Cong, Jason
    Hwu, Wen-Mei W.
    2009 IEEE 7TH SYMPOSIUM ON APPLICATION SPECIFIC PROCESSORS (SASP 2009), 2009, : 35 - +
  • [2] High-Performance CUDA Kernel Execution on FPGAs
    Papakonstantinou, Alexandros
    Gururaj, Karthik
    Stratton, John A.
    Chen, Deming
    Cong, Jason
    Hwu, Wen-Mei W.
    ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 515 - 516
  • [3] Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs
    Zohouri, Hamid Reza
    Maruyama, Naoya
    Smith, Aaron
    Matsuda, Motohiko
    Matsuoka, Satoshi
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 409 - 420
  • [4] Integrating FPGAs in High-Performance Computing: Introduction
    Chow, Paul
    Hutton, Mike
    FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 131 - 131
  • [5] Quantum Computing and High-Performance Computing: Compilation Stack Similarities
    Alarcon, Sonia Lopez
    Elster, Anne
    COMPUTING IN SCIENCE & ENGINEERING, 2022, 24 (06) : 66 - 71
  • [6] Quantum Computing and High-Performance Computing: Compilation Stack Similarities
    Lopez Alarcon, Sonia
    Elster, Anne C.
    Elsters, Anne
    Lopez Alarcon, Sonia
    Computing in Science and Engineering, 2022, 24 (06): : 66 - 71
  • [7] CUDA: Scalable parallel programming for high-performance scientific computing
    Luebke, David
    2008 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, VOLS 1-4, 2008, : 836 - 838
  • [8] Integrating FPGAs in High-Performance Computing: The Architecture and Implementation Perspective
    Woods, Nathan
    FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 132 - 132
  • [9] A Multicore Architecture for High-Performance Scientific Computing using FPGAs
    Cobos Carrascosa, J. P.
    Aparicio del Moral, B.
    Ramos, J. L.
    Lopez Jimenez, A. C.
    del Toro Iniesta, J. C.
    2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 223 - 228
  • [10] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
    Podobas, Artur
    Zohouri, Hamid Reza
    Maruyama, Naoya
    Matsuoka, Satoshi
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,