A High-Throughput Reconfigurable Processing Array for Neural Networks

被引：0

作者：

Wu, Ephrem ^{[1
]}

Zhang, Xiaoqian ^{[1
]}

Berman, David ^{[1
]}

Cho, Inkeun ^{[1
]}

机构：

[1] Xilinx Inc, San Jose, CA 95124 USA

来源：

2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年

关键词：

convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).

引用

页数：4

共 50 条

[1] High-throughput and compact reconfigurable architectures for recursive filters
Shinde, Vaishali
Kumar, Ganesh Jai
Valencia, Daniel
Alimohammad, Amirhossein
IET COMMUNICATIONS, 2018, 12 (13) : 1616 - 1623
[2] High-throughput systolic array-based accelerator for hybrid transformer-CNN networks
Song, Qingzeng
Dai, Yao
Lu, Hao
Jin, Guanghao
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (08)
[3] Efficiently Removing Sparsity for High-Throughput Stream Processing
Papaphilippou, Philippos
Que, Zhiqiang
Luk, Wayne
2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 244 - 249
[4] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
Feng, Gan
Hu, Zuyi
Chen, Song
Wu, Feng
2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
[5] An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic
Jian Fang
Jianyu Chen
Jinho Lee
Zaid Al-Ars
H. Peter Hofstee
Journal of Signal Processing Systems, 2020, 92 : 931 - 947
[6] An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic
Fang, Jian
Chen, Jianyu
Lee, Jinho
Al-Ars, Zaid
Hofstee, H. Peter
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2020, 92 (09): : 931 - 947
[7] Employing Deep Neural Networks and High-Throughput Computing for the Recognition and Prediction of Vein-Like Structures
Niu, Junbo
Chi, Zhiyu
Wang, Feilong
Miao, Bin
Guo, Jiaxu
Ding, Zhifeng
He, Yin
Ma, Xinxin
ADVANCED INTELLIGENT SYSTEMS, 2024,
[8] HIERA: High-Quality and High-Throughput Dehazing Hardware Accelerator with Reconfigurable Computing Unit
Zhang, Junhao
Fan, Dongqi
Chang, Liang
2024 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, ISVLSI, 2024, : 75 - 80
[9] High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations
Nash, J. Greg
2014 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2014, : 878 - 884
[10] RECO-HCON: A High-Throughput Reconfigurable Compact ASCON Processor for Trusted IoT
Wei, Xiangdong
El-Hadedy, Mohamed
Mosanu, Sergiu
Zhu, Zhengping
Hwu, Wen-Mei
Guo, Xinfei
2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 25 - 30

← 1 2 3 4 5 →