An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS

被引:143
作者
Bankman, Daniel [1 ]
Yang, Lita [1 ]
Moons, Bert [2 ]
Verhelst, Marian [2 ]
Murmann, Boris [1 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[2] Katholieke Univ Leuven, ESAT MICAS, Dept Elect Engn, B-3001 Heverlee, Belgium
关键词
Binarized neural networks; deep learning; mixed-signal processing; near-memory computing; switched-capacitor (SC);
D O I
10.1109/JSSC.2018.2869150
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The trend of pushing inference from cloud to edge due to concerns of latency, bandwidth, and privacy has created demand for energy-efficient neural network hardware. This paper presents a mixed-signal binary convolutional neural network (CNN) processor for always-on inference applications that achieves 3.8 mu J/classification at 86% accuracy on the CIFAR-10 image classification data set. The goal of this paper is to establish the minimum-energy point for the representative CIFAR-10 inference task, using the available design tradeoffs. The BinaryNet algorithm for training neural networks with weights and activations constrained to + 1 and -1 drastically simplifies multiplications to XNOR and allows integrating all memory on-chip. A weight-stationary, data-parallel architecture with input reuse amortizes memory access across many computations, leaving wide vector summation as the remaining energy bottleneck. This design features an energy-efficient switched-capacitor (SC) neuron that addresses this challenge, employing a 1024-bit thermometer-coded capacitive digital-to-analog converter (CDAC) section for summing pointwise products of CNN filter weights and activations and a 9-bit binary-weighted section for adding the filter bias. The design occupies 6 mm(2) in 28-nm CMOS, contains 328 kB of on-chip SRAM, operates at 237 frames/s (FPS), and consumes 0.9 mW from 0.6 V/0.8 V supplies. The corresponding energy per classification (3.8 mu J) amounts to a 40x improvement over the previous low-energy benchmark on CIFAR-10, achieved in part by sacrificing some programmability. The SC neuron array is 12.9x more energy efficient than a synthesized digital implementation, which amounts to a 4x advantage in system-level energy per classification.
引用
收藏
页码:158 / 172
页数:15
相关论文
共 36 条
  • [1] Ando K, 2017, SYMP VLSI CIRCUITS, pC24, DOI 10.23919/VLSIC.2017.8008533
  • [2] [Anonymous], 2016, BINARIZED NEURAL NET
  • [3] [Anonymous], 2011, Lessons Learned From Manually Classi- fying CIFAR-10
  • [4] [Anonymous], 2018 IEEE CUST INT C
  • [5] [Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
  • [6] [Anonymous], 2017, COMMUN ACM, DOI DOI 10.1145/3065386
  • [7] [Anonymous], 2014, CORR
  • [8] [Anonymous], APPL MACH LEARN J
  • [9] [Anonymous], APPL MACH LEARN J
  • [10] [Anonymous], 2017, MOBILENETS EFFICIENT