A High-Throughput Reconfigurable Processing Array for Neural Networks

被引：0

作者：

Wu, Ephrem ^{[1
]}

Zhang, Xiaoqian ^{[1
]}

Berman, David ^{[1
]}

Cho, Inkeun ^{[1
]}

机构：

[1] Xilinx Inc, San Jose, CA 95124 USA

来源：

2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2017年

关键词：

convolutional neural networks; timing closure; matrix multiplication; FPGA; DSP; cache; memory bandwidth;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale (741 MHz) and UltraScale+ (891 MHz) devices. To enhance portability and consistency of timing closure, this array operates at a high clock rate while data SRAMs run at a fraction of this rate. As a proof of concept, this paper outlines a processing array for matrix multiplication and convolution, the most compute-intensive operations of a convolutional neural network (CNN).

引用

页数：4

共 50 条

[41] SpikeMotion: A Transformer Framework for High-Throughput Video Segmentation on FPGA
Udeji, Uchechukwu Leo
Margala, Martin
2024 IEEE 67TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, MWSCAS 2024, 2024, : 818 - 822
[42] Application of deep learning for high-throughput phenotyping of seed: a review
Jin, Chen
Zhou, Lei
Pu, Yuanyuan
Zhang, Chu
Qi, Hengnian
Zhao, Yiying
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (03)
[43] CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks
Han, Xushen
Zhou, Dajiang
Wang, Shihao
Kimura, Shinji
PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 320 - 327
[44] Field Programmable Neural Array for Feed-Forward Neural Networks
Bohrn, Marek
Fujcik, Lukas
Vrba, Radimir
2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 727 - 731
[45] High-throughput FPGA implementation for quadratic unconstrained binary optimization
Kagawa, Hiroshi
Ito, Yasuaki
Nakano, Koji
Yasudo, Ryota
Kawamata, Yuya
Katsuki, Ryota
Tabata, Yusuke
Yazane, Takashi
Hamano, Kenichiro
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (14)
[46] High-throughput CAM based on a synchronous overlapped search scheme
Onizawa, Naoya
Matsunaga, Shoun
Gaudet, Vincent C.
Gross, Warren J.
Hanyu, Takahiro
IEICE ELECTRONICS EXPRESS, 2013, 10 (07):
[47] Application of a Reconfigurable Computing Cluster to Ultra High Throughput Genome Resequencing
Stevens, Kristian
Chen, Henry
Filiba, Terry
McMahon, Peter
Song, Yun S.
FPGA 10, 2010, : 284 - 284
[48] An FPGA-Based High-Throughput Keypoint Detection Accelerator Using Convolutional Neural Network for Mobile Robot Applications
Li, Jingyuan
Liu, Ye
Huang, Kun
Zhou, Liang
Chang, Liang
Zhou, Jun
2022 IEEE ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIMEASIA, 2022, : 81 - 84
[49] A high-throughput fixed-point complex divider for FPGAs
Wang, Dong
Ren, Pengju
Liu, Leibo
IEICE ELECTRONICS EXPRESS, 2013, 10 (04):
[50] PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization
Meng, Yuan
Kuppannagari, Sanmukh
Kannan, Rajgopal
Prasanna, Viktor
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2066 - 2078

← 1 2 3 4 5 →