Better Scalability: Improvement of Block-Based CNN Accelerator for FPGAs

被引:0
作者
Chen, Yan [1 ]
Tanaka, Kiyofumi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Ishikawa 9231292, Japan
关键词
FPGA; hardware accelerators; convolutional neural networks; MobileNetV2; YOLOv3; ONTOLOGY;
D O I
10.1109/ACCESS.2024.3514325
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As Convolutional Neural Networks (CNN) have become widely used, numerous accelerators have been designed, which are mainly divided into two architectures: Overlay architecture with a single Processing Element (PE) array and Dataflow architecture with one PE array per layer. Overlay architecture accelerators require a large amount of off-chip memory bandwidth, whereas Dataflow architecture accelerators require a large amount of on-chip memory capacity. We designed a hybrid architecture based on the characteristics of modern CNN models composed of repetitive blocks, effectively combining the advantages of both architectures and avoiding their drawbacks. It has been proven to achieve extremely high throughput while requiring less than 8% of the bandwidth for Overlay architecture accelerators to run MobileNetV2. Unlike Dataflow architecture accelerators, it does not require significant on-chip memory. A comparison shows that its area efficiency far surpasses that of existing works. However, its scalability remains suboptimal, and this study aims to address this issue. The improved accelerator demonstrated consistent efficiency across the capacity range of existing devices and was successfully implemented on a compact 7Z007S. When deployed on a large-scale VU13P, it achieved an impressive throughput exceeding 10000 frames per second when running MobileNetV2.
引用
收藏
页码:187587 / 187603
页数:17
相关论文
共 42 条
[1]  
Alwani M, 2016, INT SYMP MICROARCH
[2]  
amd, 2024, Ultrascale Architecture and Product Data Sheet: Overview
[3]  
amd, 2024, Versal Architecture and Product Data Sheet: Overview
[4]  
AMD, 2022, Ultrascale architecture: Overview (DS890)
[5]  
[Anonymous], 2013, Darknet: open source neural networks in C
[6]  
[Anonymous], 2018, Series Dsp48-1 Slice User Guide
[7]   FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks [J].
Blott, Michaela ;
Preusser, Thomas B. ;
Fraser, Nicholas J. ;
Gambardella, Giulio ;
O'Brien, Kenneth ;
Umuroglu, Yaman ;
Leeser, Miriam ;
Vissers, Kees .
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (03)
[8]   High Throughput and Low Bandwidth Demand: Accelerating CNN Inference Block-by-block on FPGAs [J].
Chen, Yan ;
Tanaka, Kiyofumi .
2024 27TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN, DSD 2024, 2024, :503-511
[9]   HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference [J].
Dong, Zhen ;
Gao, Yizhao ;
Huang, Qijing ;
Wawrzynek, John ;
So, Hayden K. H. ;
Keutzer, Kurt .
2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, :50-59
[10]   ReBNet: Residual Binarized Neural Network [J].
Ghasemzadeh, Mohammad ;
Samragh, Mohammad ;
Koushanfar, Farinaz .
PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, :57-64