Toward Efficient Co-Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator

被引：0

作者：

Zhang, Yiran ^{[1
]}

Li, Guiying ^{[1
]}

Yuan, Bo ^{[1
]}

机构：

[1] Southern Univ Sci & Technol, Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen, Peoples R China

来源：

2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

CNN accelerator; FPGA; DSE method;

D O I：

10.1109/ISEDA62518.2024.10617620

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Field programmable gate array (FPGA) has emerged as a promising platform for accelerating convolutional neural networks (CNNs). In this paper, we propose a low-latency CNN hybrid-accelerator system and an efficient design space exploration (DSE) method. Specifically, our targeted FPGA platform consists of different types of accelerators for two advantages: high concurrency and full hardware utilization (i.e., lookup tables (LUTs) and digital signal processors (DSPs)). Besides, we adopt a bandwidth-aware analytical model for system latency to consider pipeline stalls and computation cycles simultaneously. Furthermore, for the huge design space encompassing layer-wise CNN quantization and FPGA hybrid-accelerator architecture, we propose a DSE method (named DiMEGA) aimed at enhancing search efficiency, which is a differentiable method embedded by a genetic algorithm. The performance of our CNN hybrid-accelerator system is demonstrated on a PYNQ-Z2 FPGA platform. The experimental results show that the system latency can be reduced by 42% similar to 48% without sacrificing accuracy, and the DSE time of DiMEGA is reduced by 23% on ResNet20-CIFAR10, and 63% on ResNet56-CIFAR10, compared with SOTA.

引用

页码：678 / 683

页数：6

共 19 条

[11]

Lin Yujun, 2019, NeurIPS WS

[12]

Luo X., 2022, TCAD

[13]

Motamedi M, 2016, ASIA S PACIF DES AUT, P575, DOI 10.1109/ASPDAC.2016.7428073

[14] Maximizing CNN Accelerator Efficiency Through Resource Partitioning [J].

Shen, Yongming ;

Ferdman, Michael ;

Milder, Peter .

44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :535-547

[15]

Sun M, 2022, FPGA

[16] BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing [J].

Umuroglu, Yaman ;

Rasnayake, Lahiru ;

Sjalander, Magnus .

2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, :307-314

[17] f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs [J].

Venieris, Stylianos I. ;

Bouganis, Christos-Savvas .

2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, :381-388

[18]

Wang K, 2019, PROC CVPR IEEE, P8604, DOI [10.1109/CVPR.2019.00881, 10.1109/CVPR.2019.01218]

[19]

Zhou S., 2016, arXiv, DOI DOI 10.48550/ARXIV.1606.06160

← 1 2 →