A High-Throughput Reconfigurable Processing Array for Neural Networks

被引:0
|
作者
Wu, Ephrem [1 ]
Zhang, Xiaoqian [1 ]
Berman, David [1 ]
Cho, Inkeun [1 ]
机构
[1] Xilinx Inc, San Jose, CA 95124 USA
来源
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年
关键词
convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Embedded Image Enhancement for High-Throughput Cameras
    Geerts, Stan J. C.
    Cornelissen, Dion
    de With, Peter H. N.
    VIDEO SURVEILLANCE AND TRANSPORTATION IMAGING APPLICATIONS 2014, 2014, 9026
  • [22] High-throughput Online Hash Table on FPGA
    Tong, Da
    Zhou, Shijie
    Prasanna, Viktor K.
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 105 - 112
  • [23] Perspectives on microphone array processing including sparse recovery, ray space analysis, and neural networks
    Jin, Craig T.
    Yu, Shiduo
    Antonacci, Fabio
    Arti, Augusto S.
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (01) : 308 - 317
  • [24] FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency
    Qiao, Yuran
    Shen, Junzhong
    Xiao, Tao
    Yang, Qianming
    Wen, Mei
    Zhang, Chunyuan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (20)
  • [25] A reconfigurable approach to implement neural networks for engineering application
    Li, Ang
    Wang, Qin
    Li, Zhancai
    Wan, Yong
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 2939 - +
  • [26] High-throughput FFT architectures using HLS tools
    Almorin, Hugues
    Le Gal, Bertrand
    Crenne, Jeremie
    Jego, Christophe
    Kissel, Vincent
    2022 29TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (IEEE ICECS 2022), 2022,
  • [27] A Novel High-Throughput Acceleration Engine for Read Alignment
    Chen, Yu-Ting
    Cong, Jason
    Lei, Jie
    Wei, Peng
    2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 199 - 202
  • [28] Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA
    Wang, Ziwei
    Trefzer, Martin A.
    Bale, Simon J.
    Tyrrell, Andy M.
    2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019), 2019, : 35 - 42
  • [29] Implementation of a High-Speed and High-Throughput Advanced Encryption Standard
    Kumar, T. Manoj
    Karthigaikumar, P.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 31 (02) : 1025 - 1036
  • [30] PEDSA: High-Throughput Pipeline-Based FPGA Accelerator for Convolutional Encoder-Decoder Segmentation Networks
    Jiang, Yuxian
    Li, Zhan
    Zhang, Zhihan
    Wang, Hao
    Chang, Sheng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (04) : 1326 - 1339