FPGA HPC using OpenCL: Case Study in 3D FFT

被引:10
|
作者
Sanaullah, Ahmed [1 ]
Herbordt, Martin C. [1 ]
机构
[1] Boston Univ, Dept Elect & Comp Engn, Boston, MA 02215 USA
基金
美国国家科学基金会;
关键词
ACCELERATION;
D O I
10.1145/3241793.3241800
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGAs have typically achieved high speedups for 3D Fast Fourier Transforms (FFTs) due to the presence of hard floating point units, low latency specialized pipelines, and support for complex connectivity among processing elements. Previous implementations have relied on FFT IP cores for performing the computation due to the complexity of manually developing and maintaining/upgrading efficient pipelines in HDL. These IP cores, however, are bulky and cannot be fully tuned for specific FFT sizes due to use of generic architectures. HLS tools, such as OpenCL, offer a more customizable alternative but have suffered from worse performance than HDL in previous work. In this paper we show that, using a set of code structure optimizations, OpenCL designs can be compiled to Radix-2 FFT pipelines which outperform IP core based designs for the same throughput. We further show that the HDL generated by the OpenCL compiler can be isolated and seamlessly integrated into existing 3D FFT shells to reduce implementation effort. Our single device design, tested on the Altera Arria10X115 FPGA, achieves an average speedup of 29x vs CPU-MKL, 4.1x vs GPU cuFFT and 1.1x vs IP Core FFT implementations for 16(3), 32(3) and 64(3) FFTs. Moreover, OpenCL generated compute pipelines for 8(3), 16(3), 32(3) and 64(3) FFTs use an average of 7.5x fewer ALMs and 1.6x fewer DSPs than corresponding IP core versions.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Beowulf 3D: A case study
    Engle, Rob
    STEREOSCOPIC DISPLAYS AND APPLICATIONS XIX, 2008, 6803
  • [32] Utilizing HPC technology in 3D cardiac modeling
    Papazis, N
    Dimitrelos, D
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, PROCEEDINGS, 1999, 1593 : 159 - 167
  • [33] SOFT-CORE STREAM PROCESSING ON FPGA: AN FFT CASE STUDY
    Wang, Peng
    McAllister, John
    Wu, Yun
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2756 - 2760
  • [34] 3D FFTs on a Single FPGA
    Humphries, Benjamin
    Zhang, Hansen
    Sheng, Jiayi
    Landaverde, Raphael
    Herbordt, Martin C.
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 68 - 71
  • [35] OpenCL Kernel Vectorization on the CPU, GPU, and FPGA:A Case Study with Frequent Pattern Compression
    Jin, Zheming
    Finkel, Hal
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 330 - 330
  • [36] 3D-IC Technologies and 3D FPGA
    Wu, Xin
    2015 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC 2015), 2015,
  • [37] Optimizing Parallel Reduction on OpenCL FPGA Platform - a Case Study of Frequent Pattern Compression
    Jin, Zheming
    Finkel, Hal
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 27 - 35
  • [38] 3D space charge code with the FFT method
    赵亚亮
    傅世年
    李智慧
    徐成海
    杨晓宇
    Chinese Physics C, 2013, 37 (03) : 97 - 100
  • [39] 3D space charge code with the FFT method
    Zhao Ya-Liang
    Fu Shi-Nian
    Li Zhi-Hui
    Xu Cheng-Hai
    Yang Xiao-Yu
    CHINESE PHYSICS C, 2013, 37 (03)
  • [40] Implementation of 3D Hand Gesture Recognition System using FPGA
    Tsai, Tsung-Han
    Ho, Yuan-Chen
    Tsai, Yih-Ru
    2018 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2018, : 131 - 132