A High-Throughput Reconfigurable Processing Array for Neural Networks

被引:0
|
作者
Wu, Ephrem [1 ]
Zhang, Xiaoqian [1 ]
Berman, David [1 ]
Cho, Inkeun [1 ]
机构
[1] Xilinx Inc, San Jose, CA 95124 USA
来源
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年
关键词
convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).
引用
收藏
页数:4
相关论文
共 50 条
  • [1] High-throughput and compact reconfigurable architectures for recursive filters
    Shinde, Vaishali
    Kumar, Ganesh Jai
    Valencia, Daniel
    Alimohammad, Amirhossein
    IET COMMUNICATIONS, 2018, 12 (13) : 1616 - 1623
  • [2] High-throughput systolic array-based accelerator for hybrid transformer-CNN networks
    Song, Qingzeng
    Dai, Yao
    Lu, Hao
    Jin, Guanghao
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (08)
  • [3] Efficiently Removing Sparsity for High-Throughput Stream Processing
    Papaphilippou, Philippos
    Que, Zhiqiang
    Luk, Wayne
    2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 244 - 249
  • [4] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
    Feng, Gan
    Hu, Zuyi
    Chen, Song
    Wu, Feng
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
  • [5] An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic
    Jian Fang
    Jianyu Chen
    Jinho Lee
    Zaid Al-Ars
    H. Peter Hofstee
    Journal of Signal Processing Systems, 2020, 92 : 931 - 947
  • [6] An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic
    Fang, Jian
    Chen, Jianyu
    Lee, Jinho
    Al-Ars, Zaid
    Hofstee, H. Peter
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2020, 92 (09): : 931 - 947
  • [7] Employing Deep Neural Networks and High-Throughput Computing for the Recognition and Prediction of Vein-Like Structures
    Niu, Junbo
    Chi, Zhiyu
    Wang, Feilong
    Miao, Bin
    Guo, Jiaxu
    Ding, Zhifeng
    He, Yin
    Ma, Xinxin
    ADVANCED INTELLIGENT SYSTEMS, 2024,
  • [8] HIERA: High-Quality and High-Throughput Dehazing Hardware Accelerator with Reconfigurable Computing Unit
    Zhang, Junhao
    Fan, Dongqi
    Chang, Liang
    2024 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, ISVLSI, 2024, : 75 - 80
  • [9] High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations
    Nash, J. Greg
    2014 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2014, : 878 - 884
  • [10] RECO-HCON: A High-Throughput Reconfigurable Compact ASCON Processor for Trusted IoT
    Wei, Xiangdong
    El-Hadedy, Mohamed
    Mosanu, Sergiu
    Zhu, Zhengping
    Hwu, Wen-Mei
    Guo, Xinfei
    2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 25 - 30