A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing

被引：0

作者：

Cappello, John D. ^{[1
]}

Strenski, Dave ^{[1
]}

机构：

[1] Optimal Design Inc, Sewell, NJ USA

来源：

PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13) | 2013年

关键词：

FPGA; matrix multiplication; high performance computing; floating point arithmetic; multiply-accumulate; systolic array; hardware acceleration; GFLOPS; Xilinx; Virtex-7; DSP48; heavily-pipelined accumulators;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A key enabler for Field Programmable Gate Arrays (FPGAs) in High Performance Computing (HPC) has been the addition of hard arithmetic cores. These "slices of DSP" dedicated to accelerated number crunching allow FPGAs to deliver more computing muscle, especially for floating point algorithms. This paper compares how an FPGA's performance in a practical HPC application measures up to its theoretical capacity. The implementation of a floating point matrix multiplication algorithm based on a 12x12 MAC (Multiply-Accumulate) array targeting the Xilinx Virtex 7 XT family is described. Several design techniques were used to ensure uninterrupted systolic operation of the array throughout execution, including a novel approach to handling heavily pipelined accumulators, as well as a scheme for overcoming the inherent inefficiencies of DDR3 memory. The result is a sustained "practical" performance range of 144-180 GFLOPS, compared to the target device's "theoretical" range of 257-290 GFLOPS.

引用

页码：160 / 167

页数：8

共 50 条

[41] OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing
Kobayashi, Ryohei
Oobata, Yuma
Fujita, Norihisa
Yamaguchi, Yoshiki
Boku, Taisuke
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2018), 2018, : 192 - 201
[42] Efficient Implementation Of Single Precision Floating Point Processor In FPGA
Lasith, K. K.
Thomas, Anoop
2014 ANNUAL INTERNATIONAL CONFERENCE ON EMERGING RESEARCH AREAS: MAGNETICS, MACHINES AND DRIVES (AICERA/ICMMD), 2014,
[43] A High-Performance Accelerator for Floating-Point Matrix Multiplication
Jia, Xun
Wu, Gunning
Xie, Xianghui
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 396 - 402
[44] FPGA Optimizations for a Pipelined Floating-Point Exponential Unit
Alachiotis, Nikolaos
Stamatakis, Alexandros
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2011, 6578 : 316 - 327
[45] Design and Implementation of an Embedded FPGA Floating Point DSP Block
Langhammer, Martin
Pasca, Bogdan
IEEE 22ND SYMPOSIUM ON COMPUTER ARITHMETIC ARITH 22, 2015, : 26 - 33
[46] Resource- and Power-Efficient High-Performance Object Detection Inference Acceleration Using FPGA
Tesema, Solomon Negussie
Bourennane, El-Bay
ELECTRONICS, 2022, 11 (12)
[47] Hardware Realization of High-Speed Area-Efficient Floating Point Arithmetic Unit on FPGA
Yacoub, Mohammed H.
Ismail, Samar M.
Said, Lobna A.
2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND SMART INNOVATION, ICMISI 2024, 2024, : 190 - 193
[48] High-Level Languages and Floating-Point Arithmetic for FPGA-Based CFD Simulations
Sanchez-Roman, Diego
Sutter, Gustavo
Lopez-Buedo, Sergio
Gonzalez, Ivan
Gomez-Arribas, Francisco J.
Aracil, Javier
Palacios, Francisco
IEEE DESIGN & TEST OF COMPUTERS, 2011, 28 (04): : 28 - 36
[49] Data-Intensive Computing Acceleration with Python']Python in Xilinx FPGA
Yang, Yalin
Xu, Linjie
Xu, Zichen
Wang, Yuhao
DATA QUALITY AND TRUST IN BIG DATA, 2019, 11235 : 111 - 124
[50] ConfAx: Exploiting Approximate Computing for Configurable FPGA CNN Acceleration at the Edge
Korol, Guilherme
Jordan, Michael Guilherme
Rutzig, Mateus Beck
Schneider Beck, Antonio Carlos
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1650 - 1654

← 1 2 3 4 5 →