MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA

被引:4
作者
Huang, You [1 ,2 ]
Shen, Junzhong [1 ,2 ]
Qiao, Yuran [1 ,2 ]
Wen, Mei [1 ,2 ]
Zhang, Chunyuan [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Proc, Changsha 410073, Hunan, Peoples R China
来源
IEICE ELECTRONICS EXPRESS | 2018年 / 15卷 / 10期
关键词
matrix multiplication; field-programmable gate arrays (FPGAs); work-stealing;
D O I
10.1587/elex.15.20180286
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Large-scale floating-point matrix multiplication is widely used in many scientific and engineering applications. Most existing works focus on designing a linear array architecture for accelerating matrix multiplication on FPGAs. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we present a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal parameters for matrix multiplication acceleration. Experiments on real-life convolutional neural networks (CNNs) show that we can obtain the optimal extension of the linear array architecture.
引用
收藏
页数:12
相关论文
共 13 条
  • [1] [Anonymous], 2014, LEARNING FACE REPRES
  • [2] Scheduling multithreaded computations by work stealing
    Blumofe, RD
    Leiserson, CE
    [J]. JOURNAL OF THE ACM, 1999, 46 (05) : 720 - 748
  • [3] Cong Jason, 2014, Artificial Neural Networks and Machine Learning - ICANN 2014. 24th International Conference on Artificial Neural Networks. Proceedings: LNCS 8681, P281, DOI 10.1007/978-3-319-11179-7_36
  • [4] Dou S.Yong., 2005, Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, FPGA'05, P86, DOI [DOI 10.1145/1046192.1046204, 10.1145/1046192.1046204]
  • [5] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [6] FPGA accelerator for floating-point matrix multiplication
    Jovanovic, Z.
    Milutinovic, V.
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2012, 6 (04) : 249 - 256
  • [7] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [8] FPGA based High Performance Double-precision Matrix Multiplication
    Kumar, Vinay B. Y.
    Joshi, Siddharth
    Patkar, Sachin B.
    Narayanan, H.
    [J]. 22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2009, : 341 - 346
  • [9] FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency
    Qiao, Yuran
    Shen, Junzhong
    Xiao, Tao
    Yang, Qianming
    Wen, Mei
    Zhang, Chunyuan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (20)
  • [10] Shen J., 2018, ARXIV180303790