A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引:1
|
作者
Jia, Xun [1 ]
Wu, Gunning [1 ]
Xie, Xianghui [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China
来源
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年
基金
美国国家科学基金会;
关键词
matrix multiplication; linear array; accelerator; high-performance; architecture;
D O I
10.1109/ISPA/IUCC.2017.00063
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
引用
收藏
页码:396 / 402
页数:7
相关论文
共 50 条
  • [1] FPGA accelerator for floating-point matrix multiplication
    Jovanovic, Z.
    Milutinovic, V.
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (04) : 249 - 256
  • [2] An optimized floating-point matrix multiplication on FPGA
    Zhang, T., 1832, Asian Network for Scientific Information (12): : 1832 - 1838
  • [3] Floating-point matrix multiplication in a polymorphic processor
    Kuzmanov, Georgi
    van Oijen, Wouter M.
    ICFPT 2007: INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2007, : 249 - 252
  • [4] A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing
    Li, Kai
    Mao, Wei
    Zhou, Junzhuo
    Li, Boyu
    Yang, Zhengke
    Yang, Shuxing
    Du, Laimin
    Huang, Sixiao
    Yu, Hao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4123 - 4127
  • [5] Fast algorithms for floating-point interval matrix multiplication
    Ozaki, Katsuhisa
    Ogita, Takeshi
    Rump, Siegfried M.
    Oishi, Shin'ichi
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2012, 236 (07) : 1795 - 1814
  • [6] A Coprocessor for Double-Precision Floating-Point Matrix Multiplication
    Jia X.
    Wu G.
    Xie X.
    Wu D.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (02): : 410 - 420
  • [7] A High Performance Accelerator Design for Ultra-Long Point Floating-Point FFT
    Wang D.
    Shi S.
    Wu T.
    Liu L.
    Tan H.
    Hao Z.
    Guo F.
    Li H.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (06): : 1192 - 1203
  • [8] Performance Evaluation of Strassen Matrix Multiplication Supporting Triple-Double Precision Floating-Point Arithmetic
    Kouya, Tomonori
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 163 - 176
  • [9] SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator
    Wu, Di
    Fan, Xitian
    Cao, Wei
    Wang, Lingli
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (05) : 936 - 949
  • [10] Anatomy of high-performance matrix multiplication
    Goto, Kazushige
    Van De Geijn, Robert A.
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 34 (03):