Accuracy vs. Efficiency: Achieving both Through Hardware-Aware Quantization and Reconfigurable Architecture with Mixed Precision

被引：1

作者：

Chang, Libo ^{[1
,2
]}

Zhang, Shengbing ^{[1
,2
]}

Du, Huimin ^{[3
]}

Wang, Shiyu ^{[1
,4
]}

Qiu, Meikang ^{[5
]}

Wang, Jihe ^{[1
,4
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China

[2] Natl Engn Lab Integrated AeroSp Ground Ocean Big, Xian, Peoples R China

[3] Xian Univ Posts & Telecommun, Sch Elect Engn, Xian, Peoples R China

[4] Minist Educ, Engn Res Ctr Embedded Syst Integrat, Xian, Peoples R China

[5] Texas A&M Univ Commerce, Commerce, TX USA

来源：

19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) | 2021年

关键词：

hardware/software co-design; quantization; reconfigurable CNN processor; NETWORKS;

D O I：

10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00033

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a hardware/software co-design framework, which leverages hardware-aware quantization and a reconfigurable processor to improve the computational efficiency of convolutional neural networks (CNNs) on tiny IoT devices based on reconfigurable platforms. Firstly, we proposed a multi-objective optimization value function that can weigh accuracy, the size of CNN models, and computational delay, to improve the efficiency of the mixed-precision quantization algorithm based on deep reinforcement learning. Secondly, we propose a reconfigurable CNN processor that can adapt to the computing characteristics of various quantized CNN models, as well as a reconfigurable computing array and an on-chip elastic buffer, to improve the performance and computing efficiency on edge equipment. Finally, we demonstrate the effectiveness of the proposed co-design method through an extensive evaluation of the Ultra96-V2 platform. With respect to the well-known CNNs VGG-16, ResNet-50, and MobileNet-V2, the experimental result shows that the throughput of 216.6 GOPS, 214.0 GOPS, and 53.6 GOPS, the computing efficiency of 0.63GOPS/DSP, 0.64GOPS/DSP, and 0.24 GOPS/DSP, respectively. In addition, achieving a better optimized trade-off between the computing efficiency and accuracy compared with the recently proposed CNN processor with fixed bit-width and mixed-precision.

引用

页码：151 / 158

页数：8

共 32 条

[1] Polymorphic Accelerators for Deep Neural Networks
Azizimazreah, Arash
Chen, Lizhong
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (03) : 534 - 546
[2] Choi J., 2018, ARXIV180506085V2
[3] FUSION OF COGNITIVE WIRELESS NETWORKS AND EDGE COMPUTING
Gai, Keke
Xu, Kai
Lu, Zhihui
Qiu, Meikang
Zhu, Liehuang
[J]. IEEE WIRELESS COMMUNICATIONS, 2019, 26 (03) : 69 - 75
[4] In-memory big data analytics under space constraints using dynamic programming
Gai, Keke
Qiu, Meikang
Liu, Meiqin
Xiong, Zenggang
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 83 : 219 - 227
[5] Gholami A., 2021, ARXIV 210313630
[6] A Real-Time FPGA-Based Accelerator for ECG Analysis and Diagnosis Using Association-Rule Mining
Gu, Xiaoqi
Zhu, Yongxin
Zhou, Shengyan
Wang, Chaojun
Qiu, Meikang
Wang, Guoxing
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2016, 15 (02)
[7] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
Guo, Kaiyuan
Sui, Lingzhi
Qiu, Jiantao
Yu, Jincheng
Wang, Junbin
Yao, Song
Han, Song
Wang, Yu
Yang, Huazhong
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) : 35 - 47
[8] Low-Power, Intelligent Sensor Hardware Interface for Medical Data Preprocessing
Hu, Fei
Lakdawala, Shruti
Hao, Qi
Qiu, Meikang
[J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (04): : 656 - 663
[9] A Hardware Pipeline with High Energy and Resource Efficiency for FMM Acceleration
Huang, Tian
Zhu, Yongxin
Ha, Yajun
Wang, Xu
Qiu, Meikang
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)
[10] Krishnamoorthi R., 2018, ARXIV180608342

← 1 2 3 4 →