A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing

被引:0
|
作者
Li W. [1 ]
Chen Y. [1 ]
Chen T. [1 ]
Nan L. [1 ]
Du Y. [1 ]
机构
[1] School of Cryptographic Engineering, Strategic Support Force Information Engineering University, Zhengzhou
关键词
ASIC; Convolutional Neural Network (CNN); Hardware acceleration; Reconfigurable;
D O I
10.11999/JEIT230509
中图分类号
学科分类号
摘要
With the development of Deep Learning, the number of parameters and computation of Convolutional Neural Network (CNN) increases dramatically, which greatly raises the cost of deploying CNN algorithms on edge devices. To reduce the difficulty of the deployment and decrease the inference latency and energy consumption of CNN on the edge side, a Reconfigurable CNN Co-Processor for edge computing is proposed. Based on the data flow pattern of channel-wise processing, the proposed two-level distributed storage scheme solves the problem of power consumption overhead and performance degradation caused by large data movement between PE units and large-scale migration of intermediate data on chip. To avoid the complex data interconnection network propagation mechanism in PE arrays and reduce the complexity of control, a flexible local access mechanism and a padding mechanism based on address translation are proposed, which can perform conventional convolution, deep separable convolution, pooling and fully connected operations with great flexibility. The proposed co-processor contains 256 Processing Elements (PEs) and 176 kB on-chip SRAM. Synthesized and post-layout with 55-nm TT Corner CMOS process (25 °C, 1.2 V), the CNN co-processor achieves a maximum clock frequency of 328 MHz and an area of 4.41 mm2. The peak performance of the coprocessor is 163.8 GOPs at 320 MHz and the area efficiency is 37.14 GOPs/mm2, the energy efficiency of LeNet-5 and MobileNet are 210.7 GOPs/W and 340.08 GOPs/W, respectively, which is able to meet the energy-efficiency and performance requirements of edge intelligent computing scenarios. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1499 / 1512
页数:13
相关论文
共 23 条
  • [1] FIROUZI F, FARAHANI B, MARINSEK A., The convergence and interplay of edge, fog, and cloud in the AI-driven Internet of Things (IoT), Information Systems, 107, (2022)
  • [2] ALAM F, ALMAGHTHAWI A, KATIB I, Et al., IResponse: An AI and IoT-enabled framework for autonomous COVID-19 pandemic management, Sustainability, 13, 7, (2021)
  • [3] CHAUDHARY V, KAUSHIK A, FURUKAWA H, Et al., Review-Towards 5th generation AI and IoT driven sustainable intelligent sensors based on 2D MXenes and borophene, ECS Sensors Plus, 1, 1, (2022)
  • [4] KRIZHEVSKY A, SUTSKEVER I, HINTON G E., Imagenet classification with deep convolutional neural networks[J], Communications of the ACM, 60, 6, pp. 84-90, (2017)
  • [5] LU Wenyan, YAN Guihai, LI Jiajun, Et al., FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks[C], 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 553-564, (2017)
  • [6] PARK J S, PARK C, KWON S, Et al., A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision Datapath in 4-nm flagship mobile SoC[J], IEEE Journal of Solid-State Circuits, 58, 1, pp. 189-202, (2023)
  • [7] GOKHALE V, JIN J, DUNDAR A, Et al., A 240 G-ops/s mobile coprocessor for deep neural networks[C], IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 682-687, (2014)
  • [8] DU Zidong, FASTHUBER R, CHEN Tianshi, Et al., ShiDianNao: Shifting vision processing closer to the sensor[C], 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 92-104, (2015)
  • [9] ZHANG Chen, LI Peng, SUN Guangyu, Et al., Optimizing FPGA-based accelerator design for deep convolutional neural networks[C], The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161-170, (2015)
  • [10] CHEN Y H, KRISHNA T, EMER J S, Et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks[J], IEEE Journal of Solid-State Circuits, 52, 1, pp. 127-138, (2017)