A High-Throughput Reconfigurable Processing Array for Neural Networks

被引:0
|
作者
Wu, Ephrem [1 ]
Zhang, Xiaoqian [1 ]
Berman, David [1 ]
Cho, Inkeun [1 ]
机构
[1] Xilinx Inc, San Jose, CA 95124 USA
来源
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年
关键词
convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).
引用
收藏
页数:4
相关论文
共 50 条
  • [41] SpikeMotion: A Transformer Framework for High-Throughput Video Segmentation on FPGA
    Udeji, Uchechukwu Leo
    Margala, Martin
    2024 IEEE 67TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, MWSCAS 2024, 2024, : 818 - 822
  • [42] Application of deep learning for high-throughput phenotyping of seed: a review
    Jin, Chen
    Zhou, Lei
    Pu, Yuanyuan
    Zhang, Chu
    Qi, Hengnian
    Zhao, Yiying
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (03)
  • [43] CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks
    Han, Xushen
    Zhou, Dajiang
    Wang, Shihao
    Kimura, Shinji
    PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 320 - 327
  • [44] Field Programmable Neural Array for Feed-Forward Neural Networks
    Bohrn, Marek
    Fujcik, Lukas
    Vrba, Radimir
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 727 - 731
  • [45] High-throughput FPGA implementation for quadratic unconstrained binary optimization
    Kagawa, Hiroshi
    Ito, Yasuaki
    Nakano, Koji
    Yasudo, Ryota
    Kawamata, Yuya
    Katsuki, Ryota
    Tabata, Yusuke
    Yazane, Takashi
    Hamano, Kenichiro
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (14)
  • [46] High-throughput CAM based on a synchronous overlapped search scheme
    Onizawa, Naoya
    Matsunaga, Shoun
    Gaudet, Vincent C.
    Gross, Warren J.
    Hanyu, Takahiro
    IEICE ELECTRONICS EXPRESS, 2013, 10 (07):
  • [47] Application of a Reconfigurable Computing Cluster to Ultra High Throughput Genome Resequencing
    Stevens, Kristian
    Chen, Henry
    Filiba, Terry
    McMahon, Peter
    Song, Yun S.
    FPGA 10, 2010, : 284 - 284
  • [48] An FPGA-Based High-Throughput Keypoint Detection Accelerator Using Convolutional Neural Network for Mobile Robot Applications
    Li, Jingyuan
    Liu, Ye
    Huang, Kun
    Zhou, Liang
    Chang, Liang
    Zhou, Jun
    2022 IEEE ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIMEASIA, 2022, : 81 - 84
  • [49] A high-throughput fixed-point complex divider for FPGAs
    Wang, Dong
    Ren, Pengju
    Liu, Leibo
    IEICE ELECTRONICS EXPRESS, 2013, 10 (04):
  • [50] PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization
    Meng, Yuan
    Kuppannagari, Sanmukh
    Kannan, Rajgopal
    Prasanna, Viktor
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2066 - 2078