MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA

被引：4

作者：

Huang, You ^{[1
,2
]}

Shen, Junzhong ^{[1
,2
]}

Qiao, Yuran ^{[1
,2
]}

Wen, Mei ^{[1
,2
]}

Zhang, Chunyuan ^{[1
,2
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Proc, Changsha 410073, Hunan, Peoples R China

来源：

IEICE ELECTRONICS EXPRESS | 2018年 / 15卷 / 10期

关键词：

matrix multiplication; field-programmable gate arrays (FPGAs); work-stealing;

D O I：

10.1587/elex.15.20180286

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Large-scale floating-point matrix multiplication is widely used in many scientific and engineering applications. Most existing works focus on designing a linear array architecture for accelerating matrix multiplication on FPGAs. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we present a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal parameters for matrix multiplication acceleration. Experiments on real-life convolutional neural networks (CNNs) show that we can obtain the optimal extension of the linear array architecture.

引用

页数：12

共 13 条

[1] [Anonymous], 2014, LEARNING FACE REPRES
[2] Scheduling multithreaded computations by work stealing
Blumofe, RD
Leiserson, CE
[J]. JOURNAL OF THE ACM, 1999, 46 (05) : 720 - 748
[3] Cong Jason, 2014, Artificial Neural Networks and Machine Learning - ICANN 2014. 24th International Conference on Artificial Neural Networks. Proceedings: LNCS 8681, P281, DOI 10.1007/978-3-319-11179-7_36
[4] Dou S.Yong., 2005, Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, FPGA'05, P86, DOI [DOI 10.1145/1046192.1046204, 10.1145/1046192.1046204]
[5] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[6] FPGA accelerator for floating-point matrix multiplication
Jovanovic, Z.
Milutinovic, V.
[J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (04) : 249 - 256
[7] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[8] FPGA based High Performance Double-precision Matrix Multiplication
Kumar, Vinay B. Y.
Joshi, Siddharth
Patkar, Sachin B.
Narayanan, H.
[J]. 22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2009, : 341 - 346
[9] FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency
Qiao, Yuran
Shen, Junzhong
Xiao, Tao
Yang, Qianming
Wen, Mei
Zhang, Chunyuan
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (20)
[10] Shen J., 2018, ARXIV180303790

← 1 2 →