Toward Efficient Co-Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator

被引:0
作者
Zhang, Yiran [1 ]
Li, Guiying [1 ]
Yuan, Bo [1 ]
机构
[1] Southern Univ Sci & Technol, Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen, Peoples R China
来源
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024 | 2024年
基金
中国国家自然科学基金;
关键词
CNN accelerator; FPGA; DSE method;
D O I
10.1109/ISEDA62518.2024.10617620
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field programmable gate array (FPGA) has emerged as a promising platform for accelerating convolutional neural networks (CNNs). In this paper, we propose a low-latency CNN hybrid-accelerator system and an efficient design space exploration (DSE) method. Specifically, our targeted FPGA platform consists of different types of accelerators for two advantages: high concurrency and full hardware utilization (i.e., lookup tables (LUTs) and digital signal processors (DSPs)). Besides, we adopt a bandwidth-aware analytical model for system latency to consider pipeline stalls and computation cycles simultaneously. Furthermore, for the huge design space encompassing layer-wise CNN quantization and FPGA hybrid-accelerator architecture, we propose a DSE method (named DiMEGA) aimed at enhancing search efficiency, which is a differentiable method embedded by a genetic algorithm. The performance of our CNN hybrid-accelerator system is demonstrated on a PYNQ-Z2 FPGA platform. The experimental results show that the system latency can be reduced by 42% similar to 48% without sacrificing accuracy, and the DSE time of DiMEGA is reduced by 23% on ResNet20-CIFAR10, and 63% on ResNet56-CIFAR10, compared with SOTA.
引用
收藏
页码:678 / 683
页数:6
相关论文
共 19 条
[11]  
Lin Yujun, 2019, NeurIPS WS
[12]  
Luo X., 2022, TCAD
[13]  
Motamedi M, 2016, ASIA S PACIF DES AUT, P575, DOI 10.1109/ASPDAC.2016.7428073
[14]   Maximizing CNN Accelerator Efficiency Through Resource Partitioning [J].
Shen, Yongming ;
Ferdman, Michael ;
Milder, Peter .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :535-547
[15]  
Sun M, 2022, FPGA
[16]   BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing [J].
Umuroglu, Yaman ;
Rasnayake, Lahiru ;
Sjalander, Magnus .
2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, :307-314
[17]   f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs [J].
Venieris, Stylianos I. ;
Bouganis, Christos-Savvas .
2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, :381-388
[18]  
Wang K, 2019, PROC CVPR IEEE, P8604, DOI [10.1109/CVPR.2019.00881, 10.1109/CVPR.2019.01218]
[19]  
Zhou S., 2016, arXiv, DOI DOI 10.48550/ARXIV.1606.06160