Toward Efficient Co-Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator

被引:0
作者
Zhang, Yiran [1 ]
Li, Guiying [1 ]
Yuan, Bo [1 ]
机构
[1] Southern Univ Sci & Technol, Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen, Peoples R China
来源
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024 | 2024年
基金
中国国家自然科学基金;
关键词
CNN accelerator; FPGA; DSE method;
D O I
10.1109/ISEDA62518.2024.10617620
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field programmable gate array (FPGA) has emerged as a promising platform for accelerating convolutional neural networks (CNNs). In this paper, we propose a low-latency CNN hybrid-accelerator system and an efficient design space exploration (DSE) method. Specifically, our targeted FPGA platform consists of different types of accelerators for two advantages: high concurrency and full hardware utilization (i.e., lookup tables (LUTs) and digital signal processors (DSPs)). Besides, we adopt a bandwidth-aware analytical model for system latency to consider pipeline stalls and computation cycles simultaneously. Furthermore, for the huge design space encompassing layer-wise CNN quantization and FPGA hybrid-accelerator architecture, we propose a DSE method (named DiMEGA) aimed at enhancing search efficiency, which is a differentiable method embedded by a genetic algorithm. The performance of our CNN hybrid-accelerator system is demonstrated on a PYNQ-Z2 FPGA platform. The experimental results show that the system latency can be reduced by 42% similar to 48% without sacrificing accuracy, and the DSE time of DiMEGA is reduced by 23% on ResNet20-CIFAR10, and 63% on ResNet56-CIFAR10, compared with SOTA.
引用
收藏
页码:678 / 683
页数:6
相关论文
共 19 条
[1]  
[Anonymous], 2015, 2015 ACMSIGDA INT S, DOI DOI 10.1145/2684746.2689060
[2]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[3]   Automated HW/SW Co-design for Edge AI: State, Challenges and Steps Ahead [J].
Bringmann, Oliver ;
Ecker, Wolfgang ;
Feldner, Ingo ;
Frischknecht, Adrian ;
Gerum, Christoph ;
Hamalainen, Timo ;
Hanif, Muhammad Abdullah ;
Klaiber, Michael J. ;
Mueller-Gritschneder, Daniel ;
Bernardo, Paul Palomero ;
Prebeck, Sebastian ;
Shafique, Muhammad .
2021 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS 2021), 2021, :11-20
[4]  
Choi J, 2018, Arxiv, DOI arXiv:1805.06085
[5]  
Fasfous N, 2022, DES AUT TEST EUROPE, P238, DOI 10.23919/DATE54114.2022.9774574
[6]   HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology [J].
Fasfous, Nael ;
Vemparala, Manoj Rohit ;
Frickenstein, Alexander ;
Valpreda, Emanuele ;
Salihu, Driton ;
Nguyen Anh Vu Doan ;
Unger, Christian ;
Nagaraja, Naveen Shankar ;
Martina, Maurizio ;
Stechele, Walter .
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
[7]   FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge [J].
Hao, Cong ;
Zhang, Xiaofan ;
Li, Yuhong ;
Huang, Sitao ;
Xiong, Jinjun ;
Rupnow, Kyle ;
Hwu, Wen-mei ;
Chen, Deming .
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[8]   EXACT AND APPROXIMATE ALGORITHMS FOR SCHEDULING NONIDENTICAL PROCESSORS [J].
HOROWITZ, E ;
SAHNI, S .
JOURNAL OF THE ACM, 1976, 23 (02) :317-327
[9]   Hardware/Software Co-Exploration of Neural Architectures [J].
Jiang, Weiwen ;
Yang, Lei ;
Sha, Edwin Hsing-Mean ;
Zhuge, Qingfeng ;
Gu, Shouzhen ;
Dasgupta, Sakyasingha ;
Shi, Yiyu ;
Hu, Jingtong .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (12) :4805-4815
[10]   Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search [J].
Jiang, Weiwen ;
Zhang, Xinyi ;
Sha, Edwin H-M ;
Yang, Lei ;
Zhuge, Qingfeng ;
Shi, Yiyu ;
Hu, Jingtong .
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,