A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing

被引：0

作者：

Cappello, John D. ^{[1
]}

Strenski, Dave ^{[1
]}

机构：

[1] Optimal Design Inc, Sewell, NJ USA

来源：

PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13) | 2013年

关键词：

FPGA; matrix multiplication; high performance computing; floating point arithmetic; multiply-accumulate; systolic array; hardware acceleration; GFLOPS; Xilinx; Virtex-7; DSP48; heavily-pipelined accumulators;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A key enabler for Field Programmable Gate Arrays (FPGAs) in High Performance Computing (HPC) has been the addition of hard arithmetic cores. These "slices of DSP" dedicated to accelerated number crunching allow FPGAs to deliver more computing muscle, especially for floating point algorithms. This paper compares how an FPGA's performance in a practical HPC application measures up to its theoretical capacity. The implementation of a floating point matrix multiplication algorithm based on a 12x12 MAC (Multiply-Accumulate) array targeting the Xilinx Virtex 7 XT family is described. Several design techniques were used to ensure uninterrupted systolic operation of the array throughout execution, including a novel approach to handling heavily pipelined accumulators, as well as a scheme for overcoming the inherent inefficiencies of DDR3 memory. The result is a sustained "practical" performance range of 144-180 GFLOPS, compared to the target device's "theoretical" range of 257-290 GFLOPS.

引用

页码：160 / 167

页数：8

共 50 条

[31] An optimized floating-point matrix multiplication on FPGA
Zhang, T., 1832, Asian Network for Scientific Information (12): : 1832 - 1838
[32] Evaluation of a Floating-Point Intensive Kernel on FPGA
Jin, Zheming
Finkel, Hal
Yoshii, Kazutomo
Cappello, Franck
EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 664 - 675
[33] DSP48E Efficient Floating Point Multiplier Architectures on FPGA
Jaiswal, Manish Kumar
So, Hayden K. -H
2017 30TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2017 16TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID 2017), 2017, : 455 - 460
[34] Area-Efficient FPGA Implementation of Quadruple Precision Floating Point Multiplier
Jaiswal, Manish Kumar
Cheung, Ray C. C.
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 376 - 382
[35] Open source high performance floating-point modules
Hemmert, K. Scott
Underwood, Keith D.
FCCM 2006: 14TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2006, : 349 - +
[36] High-performance computing for SKA transient search: Use of FPGA-based accelerators
Aafreen, R.
Abhishek, R.
Ajithkumar, B.
Vaidyanathan, Arunkumar M.
Barve, Indrajit V.
Bhattramakki, Sahana
Bhat, Shashank
Girish, B. S.
Ghalame, Atul
Gupta, Y.
Hayatnagarkar, Harshal G.
Kamini, P. A.
Karastergiou, A.
Levin, L.
Madhavi, S.
Mekhala, M.
Mickaliger, M.
Mugundhan, V.
Naidu, Arun
Oppermann, J.
Pandian, B. Arul
Patra, N.
Raghunathan, A.
Roy, Jayanta
Sethi, Shiv
Shaw, B.
Sherwin, K.
Sinnen, O.
Sinha, S. K.
Srivani, K. S.
Stappers, B.
Subrahmanya, C. R.
Prabu, Thiagaraj
Vinutha, C.
Wadadekar, Y. G.
Wang, Haomiao
Williams, C.
JOURNAL OF ASTROPHYSICS AND ASTRONOMY, 2023, 44 (01)
[37] High-performance computing for SKA transient search: Use of FPGA-based accelerators
R. Aafreen
R. Abhishek
B. Ajithkumar
Arunkumar M. Vaidyanathan
Indrajit V. Barve
Sahana Bhattramakki
Shashank Bhat
B. S. Girish
Atul Ghalame
Y. Gupta
Harshal G. Hayatnagarkar
P. A. Kamini
A. Karastergiou
L. Levin
S. Madhavi
M. Mekhala
M. Mickaliger
V. Mugundhan
Arun Naidu
J. Oppermann
B. Arul Pandian
N. Patra
A. Raghunathan
Jayanta Roy
Shiv Sethi
B. Shaw
K. Sherwin
O. Sinnen
S. K. Sinha
K. S. Srivani
B. Stappers
C. R. Subrahmanya
Thiagaraj Prabu
C. Vinutha
Y. G. Wadadekar
Haomiao Wang
C. Williams
Journal of Astrophysics and Astronomy, 44
[38] Energy-Efficient Algebra Kernels in FPGA for High Performance Computing
Favaro, Federico
Dufrechou, Ernesto
Ezzatti, Pablo
Oliver, Juan P.
JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2021, 21 (02): : 80 - 92
[39] A Practical Approach to Overcome Glitches in Achieving High Performance Computing
Muhiddin, Shaik Khaja
Yalavarthi, Suresh Babu
Shekar, D. V. Chandra
PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 464 - 469
[40] Acceleration of Radio Direction Finder Algorithm in FPGA Computing Platform
Tomikowski, Piotr
Mazurek, Gustaw
2022 23RD INTERNATIONAL RADAR SYMPOSIUM (IRS), 2022, : 279 - 282

← 1 2 3 4 5 →