A High-Throughput Reconfigurable Processing Array for Neural Networks

被引:0
|
作者
Wu, Ephrem [1 ]
Zhang, Xiaoqian [1 ]
Berman, David [1 ]
Cho, Inkeun [1 ]
机构
[1] Xilinx Inc, San Jose, CA 95124 USA
来源
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年
关键词
convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).
引用
收藏
页数:4
相关论文
共 50 条
  • [31] A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems
    Kose, Habib Taha
    Nunez-Yanez, Jose
    Piechocki, Robert
    Pope, James
    INFORMATION, 2024, 15 (07)
  • [32] ReAFM: A Reconfigurable Nonlinear Activation Function Module for Neural Networks
    Wu, Xiao
    Liang, Shuang
    Wang, Meiqi
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (07) : 2660 - 2664
  • [33] Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware
    Zhu, Yihong
    Zhu, Wenping
    Chen, Chen
    Zhu, Min
    Li, Zhengdong
    Wei, Shaojun
    Liu, Leibo
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [34] High-Throughput Hardware Implementation for Motion Estimation in HEVC Encoder
    Medhat, Ahmed
    Shalaby, Ahmed
    Sayed, Mohammed S.
    2015 IEEE 58TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2015,
  • [35] A High-Throughput Hardware Implementation of SHA-256 Algorithm
    Chen, Yimeng
    Li, Shuguo
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [36] A high-throughput scalable BNN accelerator with fully pipelined architecture
    Han, Zhe
    Jiang, Jingfei
    Xu, Jinwei
    Zhang, Peng
    Zhao, Xiaoqiang
    Wen, Dong
    Dou, Yong
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (01) : 17 - 30
  • [37] A high-throughput scalable BNN accelerator with fully pipelined architecture
    Zhe Han
    Jingfei Jiang
    Jinwei Xu
    Peng Zhang
    Xiaoqiang Zhao
    Dong Wen
    Yong Dou
    CCF Transactions on High Performance Computing, 2021, 3 : 17 - 30
  • [38] High-Throughput MPSoC Implementation of Sparse Bayesian Learning Algorithm
    Wang, Jinyang
    Bourennane, El-Bay
    Madani, Mahdi
    Wang, Jun
    Li, Chao
    Tai, Yupeng
    Wang, Longxu
    Yang, Fan
    Wang, Haibin
    ELECTRONICS, 2024, 13 (01)
  • [40] On the High-Throughput Implementation of RIPEMD-160 Hash Algorithm
    Knezevic, M.
    Sakiyama, K.
    Lee, Y. K.
    Verbauwhede, I.
    2008 INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2008, : 85 - +