A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing

被引：0

作者：

Cappello, John D. ^{[1
]}

Strenski, Dave ^{[1
]}

机构：

[1] Optimal Design Inc, Sewell, NJ USA

来源：

PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13) | 2013年

关键词：

FPGA; matrix multiplication; high performance computing; floating point arithmetic; multiply-accumulate; systolic array; hardware acceleration; GFLOPS; Xilinx; Virtex-7; DSP48; heavily-pipelined accumulators;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A key enabler for Field Programmable Gate Arrays (FPGAs) in High Performance Computing (HPC) has been the addition of hard arithmetic cores. These "slices of DSP" dedicated to accelerated number crunching allow FPGAs to deliver more computing muscle, especially for floating point algorithms. This paper compares how an FPGA's performance in a practical HPC application measures up to its theoretical capacity. The implementation of a floating point matrix multiplication algorithm based on a 12x12 MAC (Multiply-Accumulate) array targeting the Xilinx Virtex 7 XT family is described. Several design techniques were used to ensure uninterrupted systolic operation of the array throughout execution, including a novel approach to handling heavily pipelined accumulators, as well as a scheme for overcoming the inherent inefficiencies of DDR3 memory. The result is a sustained "practical" performance range of 144-180 GFLOPS, compared to the target device's "theoretical" range of 257-290 GFLOPS.

引用

页码：160 / 167

页数：8

共 50 条

[21] Floating Point FPGA Architecture of PID Controller
Wadgaonkar, Jagannath
Bhole, Kalyani
Singh, Prateek
2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INSTRUMENTATION AND CONTROL (ICIC), 2015, : 1259 - 1263
[22] FPGA Implementation of Hybrid Fixed Point - Floating Point Multiplication
Amaricai, Alexandru
Boncalo, Oana
Sicoe, Ovidiu
Marcu, Marius
MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, MIXDES 2013, 2013, : 243 - 246
[23] FPGA Implementation of Vedic Floating Point Multiplier
Kodali, Ravi Kishore
Boppana, Lakshmi
Yenamachintala, Sai Sourabh
2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
[24] Integer vs. Floating-Point Processing on Modern FPGA
Hettiarachchi, Don Lahiru Nirmal
Davuluru, Venkata Salini Priyamvada
Balster, Eric J.
2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 606 - 612
[25] Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance
Underwood, KD
Hemmert, KS
12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 219 - 228
[26] Power and Performance Tradeoff of a Floating-point Intensive Kernel on OpenCL FPGA Platform
Jin, Zheming
Finkel, Hal
2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 716 - 720
[27] Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing
Firmansyah, Iman
Yamaguchi, Yoshiki
Boku, Taisuke
2016 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS, AND ITS APPLICATIONS (IC3INA) - RECENT PROGRESS IN COMPUTER, CONTROL, AND INFORMATICS FOR DATA SCIENCE, 2016, : 23 - 27
[28] FPGA based hardware architectures for high performance computing applications
Belean, Bogdan
Pogacian, Sergiu
Bot, Adrian
2012 5TH ROMANIA TIER 2 FEDERATION GRID, CLOUD & HIGH PERFORMANCE COMPUTING SCIENCE (RO-LCG), 2012, : 11 - 14
[29] Computing Acceleration of FMM Algorithm on the Basis of FPGA and GPU
Chai, Yahui
Shen, Wenfeng
Xu, Weimin
Zheng, Yanheng
MATERIALS PROCESSING TECHNOLOGY, PTS 1-4, 2011, 291-294 : 3272 - 3277
[30] Integrating FPGAs: A dynamically reconfigurable FPGA-based grid for High Performance Computing
Dondo Gazzano, Julio
2016 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, ELECTRONIC AND SYSTEMS ENGINEERING (ICAEES), 2016, : 1 - 4

← 1 2 3 4 5 →