Accuracy vs. Efficiency: Achieving both Through Hardware-Aware Quantization and Reconfigurable Architecture with Mixed Precision

被引:1
作者
Chang, Libo [1 ,2 ]
Zhang, Shengbing [1 ,2 ]
Du, Huimin [3 ]
Wang, Shiyu [1 ,4 ]
Qiu, Meikang [5 ]
Wang, Jihe [1 ,4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Engn Lab Integrated AeroSp Ground Ocean Big, Xian, Peoples R China
[3] Xian Univ Posts & Telecommun, Sch Elect Engn, Xian, Peoples R China
[4] Minist Educ, Engn Res Ctr Embedded Syst Integrat, Xian, Peoples R China
[5] Texas A&M Univ Commerce, Commerce, TX USA
来源
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) | 2021年
关键词
hardware/software co-design; quantization; reconfigurable CNN processor; NETWORKS;
D O I
10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00033
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a hardware/software co-design framework, which leverages hardware-aware quantization and a reconfigurable processor to improve the computational efficiency of convolutional neural networks (CNNs) on tiny IoT devices based on reconfigurable platforms. Firstly, we proposed a multi-objective optimization value function that can weigh accuracy, the size of CNN models, and computational delay, to improve the efficiency of the mixed-precision quantization algorithm based on deep reinforcement learning. Secondly, we propose a reconfigurable CNN processor that can adapt to the computing characteristics of various quantized CNN models, as well as a reconfigurable computing array and an on-chip elastic buffer, to improve the performance and computing efficiency on edge equipment. Finally, we demonstrate the effectiveness of the proposed co-design method through an extensive evaluation of the Ultra96-V2 platform. With respect to the well-known CNNs VGG-16, ResNet-50, and MobileNet-V2, the experimental result shows that the throughput of 216.6 GOPS, 214.0 GOPS, and 53.6 GOPS, the computing efficiency of 0.63GOPS/DSP, 0.64GOPS/DSP, and 0.24 GOPS/DSP, respectively. In addition, achieving a better optimized trade-off between the computing efficiency and accuracy compared with the recently proposed CNN processor with fixed bit-width and mixed-precision.
引用
收藏
页码:151 / 158
页数:8
相关论文
共 32 条
  • [1] Polymorphic Accelerators for Deep Neural Networks
    Azizimazreah, Arash
    Chen, Lizhong
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (03) : 534 - 546
  • [2] Choi J., 2018, ARXIV180506085V2
  • [3] FUSION OF COGNITIVE WIRELESS NETWORKS AND EDGE COMPUTING
    Gai, Keke
    Xu, Kai
    Lu, Zhihui
    Qiu, Meikang
    Zhu, Liehuang
    [J]. IEEE WIRELESS COMMUNICATIONS, 2019, 26 (03) : 69 - 75
  • [4] In-memory big data analytics under space constraints using dynamic programming
    Gai, Keke
    Qiu, Meikang
    Liu, Meiqin
    Xiong, Zenggang
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 83 : 219 - 227
  • [5] Gholami A., 2021, ARXIV 210313630
  • [6] A Real-Time FPGA-Based Accelerator for ECG Analysis and Diagnosis Using Association-Rule Mining
    Gu, Xiaoqi
    Zhu, Yongxin
    Zhou, Shengyan
    Wang, Chaojun
    Qiu, Meikang
    Wang, Guoxing
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2016, 15 (02)
  • [7] Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
    Guo, Kaiyuan
    Sui, Lingzhi
    Qiu, Jiantao
    Yu, Jincheng
    Wang, Junbin
    Yao, Song
    Han, Song
    Wang, Yu
    Yang, Huazhong
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) : 35 - 47
  • [8] Low-Power, Intelligent Sensor Hardware Interface for Medical Data Preprocessing
    Hu, Fei
    Lakdawala, Shruti
    Hao, Qi
    Qiu, Meikang
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (04): : 656 - 663
  • [9] A Hardware Pipeline with High Energy and Resource Efficiency for FMM Acceleration
    Huang, Tian
    Zhu, Yongxin
    Ha, Yajun
    Wang, Xu
    Qiu, Meikang
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)
  • [10] Krishnamoorthi R., 2018, ARXIV180608342