A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引：1

作者：

Jia, Xun ^{[1
]}

Wu, Gunning ^{[1
]}

Xie, Xianghui ^{[1
]}

机构：

[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China

来源：

2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年

基金：

美国国家科学基金会;

关键词：

matrix multiplication; linear array; accelerator; high-performance; architecture;

D O I：

10.1109/ISPA/IUCC.2017.00063

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

引用

页码：396 / 402

页数：7

共 50 条

[1] FPGA accelerator for floating-point matrix multiplication
Jovanovic, Z.
Milutinovic, V.
IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (04) : 249 - 256
[2] An optimized floating-point matrix multiplication on FPGA
Zhang, T., 1832, Asian Network for Scientific Information (12): : 1832 - 1838
[3] Floating-point matrix multiplication in a polymorphic processor
Kuzmanov, Georgi
van Oijen, Wouter M.
ICFPT 2007: INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2007, : 249 - 252
[4] A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing
Li, Kai
Mao, Wei
Zhou, Junzhuo
Li, Boyu
Yang, Zhengke
Yang, Shuxing
Du, Laimin
Huang, Sixiao
Yu, Hao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4123 - 4127
[5] Fast algorithms for floating-point interval matrix multiplication
Ozaki, Katsuhisa
Ogita, Takeshi
Rump, Siegfried M.
Oishi, Shin'ichi
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2012, 236 (07) : 1795 - 1814
[6] A Coprocessor for Double-Precision Floating-Point Matrix Multiplication
Jia X.
Wu G.
Xie X.
Wu D.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (02): : 410 - 420
[7] A High Performance Accelerator Design for Ultra-Long Point Floating-Point FFT
Wang D.
Shi S.
Wu T.
Liu L.
Tan H.
Hao Z.
Guo F.
Li H.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (06): : 1192 - 1203
[8] Performance Evaluation of Strassen Matrix Multiplication Supporting Triple-Double Precision Floating-Point Arithmetic
Kouya, Tomonori
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 163 - 176
[9] SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator
Wu, Di
Fan, Xitian
Cao, Wei
Wang, Lingli
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (05) : 936 - 949
[10] Anatomy of high-performance matrix multiplication
Goto, Kazushige
Van De Geijn, Robert A.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 34 (03):

← 1 2 3 4 5 →