A High-Throughput Reconfigurable Processing Array for Neural Networks

被引：0

作者：

Wu, Ephrem ^{[1
]}

Zhang, Xiaoqian ^{[1
]}

Berman, David ^{[1
]}

Cho, Inkeun ^{[1
]}

机构：

[1] Xilinx Inc, San Jose, CA 95124 USA

来源：

2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年

关键词：

convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).

引用

页数：4

共 50 条

[21] Embedded Image Enhancement for High-Throughput Cameras
Geerts, Stan J. C.
Cornelissen, Dion
de With, Peter H. N.
VIDEO SURVEILLANCE AND TRANSPORTATION IMAGING APPLICATIONS 2014, 2014, 9026
[22] High-throughput Online Hash Table on FPGA
Tong, Da
Zhou, Shijie
Prasanna, Viktor K.
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 105 - 112
[23] Perspectives on microphone array processing including sparse recovery, ray space analysis, and neural networks
Jin, Craig T.
Yu, Shiduo
Antonacci, Fabio
Arti, Augusto S.
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (01) : 308 - 317
[24] FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency
Qiao, Yuran
Shen, Junzhong
Xiao, Tao
Yang, Qianming
Wen, Mei
Zhang, Chunyuan
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (20)
[25] A reconfigurable approach to implement neural networks for engineering application
Li, Ang
Wang, Qin
Li, Zhancai
Wan, Yong
WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 2939 - +
[26] High-throughput FFT architectures using HLS tools
Almorin, Hugues
Le Gal, Bertrand
Crenne, Jeremie
Jego, Christophe
Kissel, Vincent
2022 29TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (IEEE ICECS 2022), 2022,
[27] A Novel High-Throughput Acceleration Engine for Read Alignment
Chen, Yu-Ting
Cong, Jason
Lei, Jie
Wei, Peng
2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 199 - 202
[28] Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA
Wang, Ziwei
Trefzer, Martin A.
Bale, Simon J.
Tyrrell, Andy M.
2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019), 2019, : 35 - 42
[29] Implementation of a High-Speed and High-Throughput Advanced Encryption Standard
Kumar, T. Manoj
Karthigaikumar, P.
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 31 (02) : 1025 - 1036
[30] PEDSA: High-Throughput Pipeline-Based FPGA Accelerator for Convolutional Encoder-Decoder Segmentation Networks
Jiang, Yuxian
Li, Zhan
Zhang, Zhihan
Wang, Hao
Chang, Sheng
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (04) : 1326 - 1339

← 1 2 3 4 5 →