A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing

被引:0
|
作者
Cappello, John D. [1 ]
Strenski, Dave [1 ]
机构
[1] Optimal Design Inc, Sewell, NJ USA
来源
PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13) | 2013年
关键词
FPGA; matrix multiplication; high performance computing; floating point arithmetic; multiply-accumulate; systolic array; hardware acceleration; GFLOPS; Xilinx; Virtex-7; DSP48; heavily-pipelined accumulators;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A key enabler for Field Programmable Gate Arrays (FPGAs) in High Performance Computing (HPC) has been the addition of hard arithmetic cores. These "slices of DSP" dedicated to accelerated number crunching allow FPGAs to deliver more computing muscle, especially for floating point algorithms. This paper compares how an FPGA's performance in a practical HPC application measures up to its theoretical capacity. The implementation of a floating point matrix multiplication algorithm based on a 12x12 MAC (Multiply-Accumulate) array targeting the Xilinx Virtex 7 XT family is described. Several design techniques were used to ensure uninterrupted systolic operation of the array throughout execution, including a novel approach to handling heavily pipelined accumulators, as well as a scheme for overcoming the inherent inefficiencies of DDR3 memory. The result is a sustained "practical" performance range of 144-180 GFLOPS, compared to the target device's "theoretical" range of 257-290 GFLOPS.
引用
收藏
页码:160 / 167
页数:8
相关论文
共 50 条
  • [1] Implementation of Vector Floating-point processing unit on FPGAs for high performance computing
    Chen, Shi
    Venkatesan, Ramachandran
    Gillard, Paul
    2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 840 - 844
  • [2] A FPGA Floating Point Interpolator
    Balas, Marius M.
    Socaci, Marius
    Olaru, Onisifor
    SOFT COMPUTING APPLICATIONS, 2013, 195 : 331 - 336
  • [3] Neural network training based on FPGA with floating point number format and it’s performance
    Mehmet Ali Çavuşlu
    Cihan Karakuzu
    Suhap Şahin
    Mehmet Yakut
    Neural Computing and Applications, 2011, 20 : 195 - 202
  • [4] Neural network training based on FPGA with floating point number format and it's performance
    Cavuslu, Mehmet Ali
    Karakuzu, Cihan
    Sahin, Suhap
    Yakut, Mehmet
    NEURAL COMPUTING & APPLICATIONS, 2011, 20 (02): : 195 - 202
  • [5] Design Optimization for High-Performance Computing Using FPGA
    Isik, Murat
    Inadagbo, Kayode
    Aktas, Hakan
    INFORMATION MANAGEMENT AND BIG DATA, SIMBIG 2023, 2024, 2142 : 142 - 156
  • [6] OmpSs@FPGA Framework for High Performance FPGA Computing
    Miguel de Haro, Juan
    Bosch, Jaume
    Filgueras, Antonio
    Vidal, Miquel
    Jimenez-Gonzalez, Daniel
    Alvarez, Carlos
    Martorell, Xavier
    Ayguade, Eduard
    Labarta, Jesus
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (12) : 2029 - 2042
  • [7] High-performance Face Detection with CPU-FPGA Acceleration
    Mohanty, Abinash
    Suda, Naveen
    Kim, Minkyu
    Vrudhula, Sarma
    Seo, Jae-sun
    Cao, Yu
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 117 - 120
  • [8] FPGA Implementation of a Custom Floating-Point Library
    Campos, Nelson
    Edirisinghe, Eran
    Fatima, Shaheen
    Chesnokov, Slava
    Lluis, Alexis
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 527 - 542
  • [9] Efficient Implementation of Floating-Point Reciprocator on FPGA
    Jaiswal, Manish Kumar
    Chandrachoodan, Nitin
    22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2009, : 267 - 271
  • [10] FPGA implementation of the high-speed floating-point operation
    Ji, XS
    Wang, SR
    ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 3, 2005, : 626 - 629