A High-Throughput Reconfigurable Processing Array for Neural Networks

被引：0

作者：

Wu, Ephrem ^{[1
]}

Zhang, Xiaoqian ^{[1
]}

Berman, David ^{[1
]}

Cho, Inkeun ^{[1
]}

机构：

[1] Xilinx Inc, San Jose, CA 95124 USA

来源：

2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年

关键词：

convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).

引用

页数：4

共 50 条

[31] A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems
Kose, Habib Taha
Nunez-Yanez, Jose
Piechocki, Robert
Pope, James
INFORMATION, 2024, 15 (07)
[32] ReAFM: A Reconfigurable Nonlinear Activation Function Module for Neural Networks
Wu, Xiao
Liang, Shuang
Wang, Meiqi
Wang, Zhongfeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (07) : 2660 - 2664
[33] Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware
Zhu, Yihong
Zhu, Wenping
Chen, Chen
Zhu, Min
Li, Zhengdong
Wei, Shaojun
Liu, Leibo
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[34] High-Throughput Hardware Implementation for Motion Estimation in HEVC Encoder
Medhat, Ahmed
Shalaby, Ahmed
Sayed, Mohammed S.
2015 IEEE 58TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2015,
[35] A High-Throughput Hardware Implementation of SHA-256 Algorithm
Chen, Yimeng
Li, Shuguo
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[36] A high-throughput scalable BNN accelerator with fully pipelined architecture
Han, Zhe
Jiang, Jingfei
Xu, Jinwei
Zhang, Peng
Zhao, Xiaoqiang
Wen, Dong
Dou, Yong
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (01) : 17 - 30
[37] A high-throughput scalable BNN accelerator with fully pipelined architecture
Zhe Han
Jingfei Jiang
Jinwei Xu
Peng Zhang
Xiaoqiang Zhao
Dong Wen
Yong Dou
CCF Transactions on High Performance Computing, 2021, 3 : 17 - 30
[38] High-Throughput MPSoC Implementation of Sparse Bayesian Learning Algorithm
Wang, Jinyang
Bourennane, El-Bay
Madani, Mahdi
Wang, Jun
Li, Chao
Tai, Yupeng
Wang, Longxu
Yang, Fan
Wang, Haibin
ELECTRONICS, 2024, 13 (01)
[39] High-throughput technologies for video signal processor (VSP) LSIs
Enomoto, T
IEICE TRANSACTIONS ON ELECTRONICS, 1996, E79C (04) : 459 - 471
[40] On the High-Throughput Implementation of RIPEMD-160 Hash Algorithm
Knezevic, M.
Sakiyama, K.
Lee, Y. K.
Verbauwhede, I.
2008 INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2008, : 85 - +

← 1 2 3 4 5 →