A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing

被引:0
|
作者
Cappello, John D. [1 ]
Strenski, Dave [1 ]
机构
[1] Optimal Design Inc, Sewell, NJ USA
来源
PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13) | 2013年
关键词
FPGA; matrix multiplication; high performance computing; floating point arithmetic; multiply-accumulate; systolic array; hardware acceleration; GFLOPS; Xilinx; Virtex-7; DSP48; heavily-pipelined accumulators;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A key enabler for Field Programmable Gate Arrays (FPGAs) in High Performance Computing (HPC) has been the addition of hard arithmetic cores. These "slices of DSP" dedicated to accelerated number crunching allow FPGAs to deliver more computing muscle, especially for floating point algorithms. This paper compares how an FPGA's performance in a practical HPC application measures up to its theoretical capacity. The implementation of a floating point matrix multiplication algorithm based on a 12x12 MAC (Multiply-Accumulate) array targeting the Xilinx Virtex 7 XT family is described. Several design techniques were used to ensure uninterrupted systolic operation of the array throughout execution, including a novel approach to handling heavily pipelined accumulators, as well as a scheme for overcoming the inherent inefficiencies of DDR3 memory. The result is a sustained "practical" performance range of 144-180 GFLOPS, compared to the target device's "theoretical" range of 257-290 GFLOPS.
引用
收藏
页码:160 / 167
页数:8
相关论文
共 50 条
  • [21] Floating Point FPGA Architecture of PID Controller
    Wadgaonkar, Jagannath
    Bhole, Kalyani
    Singh, Prateek
    2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INSTRUMENTATION AND CONTROL (ICIC), 2015, : 1259 - 1263
  • [22] FPGA Implementation of Hybrid Fixed Point - Floating Point Multiplication
    Amaricai, Alexandru
    Boncalo, Oana
    Sicoe, Ovidiu
    Marcu, Marius
    MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, MIXDES 2013, 2013, : 243 - 246
  • [23] FPGA Implementation of Vedic Floating Point Multiplier
    Kodali, Ravi Kishore
    Boppana, Lakshmi
    Yenamachintala, Sai Sourabh
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [24] Integer vs. Floating-Point Processing on Modern FPGA
    Hettiarachchi, Don Lahiru Nirmal
    Davuluru, Venkata Salini Priyamvada
    Balster, Eric J.
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 606 - 612
  • [25] Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance
    Underwood, KD
    Hemmert, KS
    12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 219 - 228
  • [26] Power and Performance Tradeoff of a Floating-point Intensive Kernel on OpenCL FPGA Platform
    Jin, Zheming
    Finkel, Hal
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 716 - 720
  • [27] Performance evaluation of Stratix V DE5-Net FPGA board for high performance computing
    Firmansyah, Iman
    Yamaguchi, Yoshiki
    Boku, Taisuke
    2016 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS, AND ITS APPLICATIONS (IC3INA) - RECENT PROGRESS IN COMPUTER, CONTROL, AND INFORMATICS FOR DATA SCIENCE, 2016, : 23 - 27
  • [28] FPGA based hardware architectures for high performance computing applications
    Belean, Bogdan
    Pogacian, Sergiu
    Bot, Adrian
    2012 5TH ROMANIA TIER 2 FEDERATION GRID, CLOUD & HIGH PERFORMANCE COMPUTING SCIENCE (RO-LCG), 2012, : 11 - 14
  • [29] Computing Acceleration of FMM Algorithm on the Basis of FPGA and GPU
    Chai, Yahui
    Shen, Wenfeng
    Xu, Weimin
    Zheng, Yanheng
    MATERIALS PROCESSING TECHNOLOGY, PTS 1-4, 2011, 291-294 : 3272 - 3277
  • [30] Integrating FPGAs: A dynamically reconfigurable FPGA-based grid for High Performance Computing
    Dondo Gazzano, Julio
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, ELECTRONIC AND SYSTEMS ENGINEERING (ICAEES), 2016, : 1 - 4