A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引:1
|
作者
Jia, Xun [1 ]
Wu, Gunning [1 ]
Xie, Xianghui [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China
基金
美国国家科学基金会;
关键词
matrix multiplication; linear array; accelerator; high-performance; architecture;
D O I
10.1109/ISPA/IUCC.2017.00063
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
引用
收藏
页码:396 / 402
页数:7
相关论文
共 50 条
  • [1] FPGA accelerator for floating-point matrix multiplication
    Jovanovic, Z.
    Milutinovic, V.
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (04): : 249 - 256
  • [2] Energy performance of floating-point matrix multiplication on FPGAs
    Zhuo, L
    Prasanna, VK
    ERSA '04: THE 2004 INTERNATIONAL CONFERENCE ON ENGINEERING OF RECONFIGURABLE SYSTEMS AND ALGORITHMS, 2004, : 316 - 316
  • [3] An optimized floating-point matrix multiplication on FPGA
    Zhang, T., 1832, Asian Network for Scientific Information (12):
  • [4] Floating-point matrix multiplication in a polymorphic processor
    Kuzmanov, Georgi
    van Oijen, Wouter M.
    ICFPT 2007: INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2007, : 249 - 252
  • [5] A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing
    Li, Kai
    Zhou, Junzhuo
    Li, Boyu
    Yang, Shuxing
    Huang, Sixiao
    Luo, Shaobo
    Mao, Wei
    Yu, Hao
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 226 - 229
  • [6] A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing
    Li, Kai
    Mao, Wei
    Zhou, Junzhuo
    Li, Boyu
    Yang, Zhengke
    Yang, Shuxing
    Du, Laimin
    Huang, Sixiao
    Yu, Hao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4123 - 4127
  • [7] Fast algorithms for floating-point interval matrix multiplication
    Ozaki, Katsuhisa
    Ogita, Takeshi
    Rump, Siegfried M.
    Oishi, Shin'ichi
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2012, 236 (07) : 1795 - 1814
  • [8] HIGH-PERFORMANCE FLOATING-POINT IMPLEMENTATION USING FPGAS
    Parker, Michael
    MILCOM 2009 - 2009 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1-4, 2009, : 323 - 327
  • [9] GENERATING HIGH-PERFORMANCE CUSTOM FLOATING-POINT PIPELINES
    de Dinechin, Florent
    Klein, Cristian
    Pasca, Bogdan
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 59 - 64
  • [10] Decimal Floating-Point Multiplication
    Erle, Mark A.
    Hickmann, Brian J.
    Schulte, Michael J.
    IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (07) : 902 - 916