CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture

被引:13
作者
Ahmadi, Mehdi [1 ]
Vakili, Shervin [1 ]
Langlois, J. M. Pierre [1 ]
机构
[1] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Deep learning; convolutional neural networks; computational dataflow; reconfigurable architecture; application-specific integrated circuit (ASIC); NEURAL-NETWORK; MEMORY; EFFICIENT; PROCESSOR; CHIP;
D O I
10.1109/TCSI.2021.3066967
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated on convolutional layers of VGGNet-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of 3 x 3 and 1 x 1 convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned.
引用
收藏
页码:3184 / 3196
页数:13
相关论文
共 50 条
[1]  
Ahmadi M, 2020, IEEE INT NEW CIRC, P214, DOI [10.1109/NEWCAS49341.2020.9159818, 10.1109/newcas49341.2020.9159818]
[2]   True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip [J].
Akopyan, Filipp ;
Sawada, Jun ;
Cassidy, Andrew ;
Alvarez-Icaza, Rodrigo ;
Arthur, John ;
Merolla, Paul ;
Imam, Nabil ;
Nakamura, Yutaka ;
Datta, Pallab ;
Nam, Gi-Joon ;
Taba, Brian ;
Beakes, Michael ;
Brezzo, Bernard ;
Kuang, Jente B. ;
Manohar, Rajit ;
Risk, William P. ;
Jackson, Bryan ;
Modha, Dharmendra S. .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) :1537-1557
[3]   YodaNN1 : An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, :236-241
[4]  
[Anonymous], 2015, 2015 ACM SIGDA INT S, DOI DOI 10.1145/2684746.2689060
[5]  
Ardakani A., 2017, ARXIV171203994
[6]   Fast and Efficient Convolutional Accelerator for Edge Computing [J].
Ardakani, Arash ;
Condo, Carlo ;
Gross, Warren J. .
IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (01) :138-152
[7]   An Architecture to Accelerate Convolution in Deep Neural Networks [J].
Ardakani, Arash ;
Condo, Carlo ;
Ahmadi, Mehdi ;
Gross, Warren J. .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (04) :1349-1362
[8]   An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS [J].
Bankman, Daniel ;
Yang, Lita ;
Moons, Bert ;
Verhelst, Marian ;
Murmann, Boris .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :158-172
[9]   Spiking Neural Networks Hardware Implementations and Challenges: A Survey [J].
Bouvier, Maxence ;
Valentian, Alexandre ;
Mesquida, Thomas ;
Rummens, Francois ;
Reyboz, Marina ;
Vianello, Elisa ;
Beigne, Edith .
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2019, 15 (02)
[10]  
Caulfield AM, 2016, INT SYMP MICROARCH