FPGA HPC using OpenCL: Case Study in 3D FFT

被引:10
|
作者
Sanaullah, Ahmed [1 ]
Herbordt, Martin C. [1 ]
机构
[1] Boston Univ, Dept Elect & Comp Engn, Boston, MA 02215 USA
基金
美国国家科学基金会;
关键词
ACCELERATION;
D O I
10.1145/3241793.3241800
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGAs have typically achieved high speedups for 3D Fast Fourier Transforms (FFTs) due to the presence of hard floating point units, low latency specialized pipelines, and support for complex connectivity among processing elements. Previous implementations have relied on FFT IP cores for performing the computation due to the complexity of manually developing and maintaining/upgrading efficient pipelines in HDL. These IP cores, however, are bulky and cannot be fully tuned for specific FFT sizes due to use of generic architectures. HLS tools, such as OpenCL, offer a more customizable alternative but have suffered from worse performance than HDL in previous work. In this paper we show that, using a set of code structure optimizations, OpenCL designs can be compiled to Radix-2 FFT pipelines which outperform IP core based designs for the same throughput. We further show that the HDL generated by the OpenCL compiler can be isolated and seamlessly integrated into existing 3D FFT shells to reduce implementation effort. Our single device design, tested on the Altera Arria10X115 FPGA, achieves an average speedup of 29x vs CPU-MKL, 4.1x vs GPU cuFFT and 1.1x vs IP Core FFT implementations for 16(3), 32(3) and 64(3) FFTs. Moreover, OpenCL generated compute pipelines for 8(3), 16(3), 32(3) and 64(3) FFTs use an average of 7.5x fewer ALMs and 1.6x fewer DSPs than corresponding IP core versions.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] FPGA-Accelerated 3D Reconstruction Using Compressive Sensing
    Chen, Jianwen
    Cong, Jason
    Yan, Ming
    Zou, Yi
    FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2012, : 163 - 166
  • [42] 3D Visualization of Region of Stabilization for Nonlinear Systems Using FPGA
    Funasaka, T.
    Iwase, M.
    Hatakeyama, S.
    IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 2154 - 2158
  • [43] Detailed Router for 3D FPGA using Sequential and Simultaneous Approach
    Ashokkumar, A.
    Chiplunkar, Niranjan N.
    Vinay, S.
    IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 1407 - 1411
  • [44] CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
    Mukherjee, Saoni
    Moore, Nicholas
    Brock, James
    Leeser, Miriam
    2012 IEEE CONFERENCE ON HIGH PERFORMANCE EXTREME COMPUTING (HPEC), 2012,
  • [45] AQUAgpusph, a new free 3D SPH solver accelerated with OpenCL
    Cercos-Pita, J. L.
    COMPUTER PHYSICS COMMUNICATIONS, 2015, 192 : 295 - 312
  • [46] Numerical Study and FPGA Implementation of a New 3D Chaotic System
    Yange Zhang
    Yicheng Zeng
    Jingliang Gao
    Brazilian Journal of Physics, 2021, 51 : 1884 - 1896
  • [47] Numerical Study and FPGA Implementation of a New 3D Chaotic System
    Zhang, Yange
    Zeng, Yicheng
    Gao, Jingliang
    BRAZILIAN JOURNAL OF PHYSICS, 2021, 51 (06) : 1884 - 1896
  • [48] Fast Analysis of 3D Inhomogeneous Dielectric Objects Using IE-FFT
    Yin, Jiliang
    Hu, Jun
    Guo, Han
    Nie, Zaiping
    APMC: 2009 ASIA PACIFIC MICROWAVE CONFERENCE, VOLS 1-5, 2009, : 84 - 87
  • [49] A 3D radiative transfer framework VIII. OpenCL implementation
    Hauschildt, P. H.
    Baron, E.
    ASTRONOMY & ASTROPHYSICS, 2011, 533
  • [50] Fast 3D gravity and magnetic modelling using midpoint quadrature and 2D FFT
    Wang, Xulong
    Liu, Jianxin
    Li, Jian
    Chen, Hang
    SCIENTIFIC REPORTS, 2023, 13 (01)