CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture

被引:13
作者
Ahmadi, Mehdi [1 ]
Vakili, Shervin [1 ]
Langlois, J. M. Pierre [1 ]
机构
[1] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Deep learning; convolutional neural networks; computational dataflow; reconfigurable architecture; application-specific integrated circuit (ASIC); NEURAL-NETWORK; MEMORY; EFFICIENT; PROCESSOR; CHIP;
D O I
10.1109/TCSI.2021.3066967
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated on convolutional layers of VGGNet-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of 3 x 3 and 1 x 1 convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned.
引用
收藏
页码:3184 / 3196
页数:13
相关论文
共 50 条
[31]  
Moons B, 2017, ISSCC DIG TECH PAP I, P246, DOI 10.1109/ISSCC.2017.7870353
[32]   A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs) [J].
Moradi, Saber ;
Qiao, Ning ;
Stefanini, Fabio ;
Indiveri, Giacomo .
IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2018, 12 (01) :106-122
[33]  
Narayanan S, 2017, IEEE IJCNN, P2451, DOI 10.1109/IJCNN.2017.7966154
[34]  
Ovtcharov K, 2015, Microsoft Research Whitepaper, V2, P1
[35]   FSpiNN: An Optimization Framework for Memory-Efficient and Energy-Efficient Spiking Neural Networks [J].
Putra, Rachmad Vidya Wicaksana ;
Shafique, Muhammad .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) :3601-3613
[36]  
Rahman A, 2016, DES AUT TEST EUROPE, P1393
[37]   You Only Look Once: Unified, Real-Time Object Detection [J].
Redmon, Joseph ;
Divvala, Santosh ;
Girshick, Ross ;
Farhadi, Ali .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :779-788
[38]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[39]   Deep learning [J].
Rusk, Nicole .
NATURE METHODS, 2016, 13 (01) :35-35
[40]  
Simonyan K, 2014, ADV NEUR IN, V27