A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing

被引：0

作者：

Cappello, John D. ^{[1
]}

Strenski, Dave ^{[1
]}

机构：

[1] Optimal Design Inc, Sewell, NJ USA

来源：

PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13) | 2013年

关键词：

FPGA; matrix multiplication; high performance computing; floating point arithmetic; multiply-accumulate; systolic array; hardware acceleration; GFLOPS; Xilinx; Virtex-7; DSP48; heavily-pipelined accumulators;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A key enabler for Field Programmable Gate Arrays (FPGAs) in High Performance Computing (HPC) has been the addition of hard arithmetic cores. These "slices of DSP" dedicated to accelerated number crunching allow FPGAs to deliver more computing muscle, especially for floating point algorithms. This paper compares how an FPGA's performance in a practical HPC application measures up to its theoretical capacity. The implementation of a floating point matrix multiplication algorithm based on a 12x12 MAC (Multiply-Accumulate) array targeting the Xilinx Virtex 7 XT family is described. Several design techniques were used to ensure uninterrupted systolic operation of the array throughout execution, including a novel approach to handling heavily pipelined accumulators, as well as a scheme for overcoming the inherent inefficiencies of DDR3 memory. The result is a sustained "practical" performance range of 144-180 GFLOPS, compared to the target device's "theoretical" range of 257-290 GFLOPS.

引用

页码：160 / 167

页数：8

共 50 条

[1] Implementation of Vector Floating-point processing unit on FPGAs for high performance computing
Chen, Shi
Venkatesan, Ramachandran
Gillard, Paul
2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 840 - 844
[2] A FPGA Floating Point Interpolator
Balas, Marius M.
Socaci, Marius
Olaru, Onisifor
SOFT COMPUTING APPLICATIONS, 2013, 195 : 331 - 336
[3] Neural network training based on FPGA with floating point number format and it’s performance
Mehmet Ali Çavuşlu
Cihan Karakuzu
Suhap Şahin
Mehmet Yakut
Neural Computing and Applications, 2011, 20 : 195 - 202
[4] Neural network training based on FPGA with floating point number format and it's performance
Cavuslu, Mehmet Ali
Karakuzu, Cihan
Sahin, Suhap
Yakut, Mehmet
NEURAL COMPUTING & APPLICATIONS, 2011, 20 (02): : 195 - 202
[5] Design Optimization for High-Performance Computing Using FPGA
Isik, Murat
Inadagbo, Kayode
Aktas, Hakan
INFORMATION MANAGEMENT AND BIG DATA, SIMBIG 2023, 2024, 2142 : 142 - 156
[6] OmpSs@FPGA Framework for High Performance FPGA Computing
Miguel de Haro, Juan
Bosch, Jaume
Filgueras, Antonio
Vidal, Miquel
Jimenez-Gonzalez, Daniel
Alvarez, Carlos
Martorell, Xavier
Ayguade, Eduard
Labarta, Jesus
IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (12) : 2029 - 2042
[7] High-performance Face Detection with CPU-FPGA Acceleration
Mohanty, Abinash
Suda, Naveen
Kim, Minkyu
Vrudhula, Sarma
Seo, Jae-sun
Cao, Yu
2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 117 - 120
[8] FPGA Implementation of a Custom Floating-Point Library
Campos, Nelson
Edirisinghe, Eran
Fatima, Shaheen
Chesnokov, Slava
Lluis, Alexis
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 527 - 542
[9] Efficient Implementation of Floating-Point Reciprocator on FPGA
Jaiswal, Manish Kumar
Chandrachoodan, Nitin
22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2009, : 267 - 271
[10] FPGA implementation of the high-speed floating-point operation
Ji, XS
Wang, SR
ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 3, 2005, : 626 - 629

← 1 2 3 4 5 →