An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS

被引：150

作者：

Bankman, Daniel ^{[1
]}

Yang, Lita ^{[1
]}

Moons, Bert ^{[2
]}

Verhelst, Marian ^{[2
]}

Murmann, Boris ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

[2] Katholieke Univ Leuven, ESAT MICAS, Dept Elect Engn, B-3001 Heverlee, Belgium

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2019年 / 54卷 / 01期

关键词：

Binarized neural networks; deep learning; mixed-signal processing; near-memory computing; switched-capacitor (SC);

D O I：

10.1109/JSSC.2018.2869150

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The trend of pushing inference from cloud to edge due to concerns of latency, bandwidth, and privacy has created demand for energy-efficient neural network hardware. This paper presents a mixed-signal binary convolutional neural network (CNN) processor for always-on inference applications that achieves 3.8 mu J/classification at 86% accuracy on the CIFAR-10 image classification data set. The goal of this paper is to establish the minimum-energy point for the representative CIFAR-10 inference task, using the available design tradeoffs. The BinaryNet algorithm for training neural networks with weights and activations constrained to + 1 and -1 drastically simplifies multiplications to XNOR and allows integrating all memory on-chip. A weight-stationary, data-parallel architecture with input reuse amortizes memory access across many computations, leaving wide vector summation as the remaining energy bottleneck. This design features an energy-efficient switched-capacitor (SC) neuron that addresses this challenge, employing a 1024-bit thermometer-coded capacitive digital-to-analog converter (CDAC) section for summing pointwise products of CNN filter weights and activations and a 9-bit binary-weighted section for adding the filter bias. The design occupies 6 mm(2) in 28-nm CMOS, contains 328 kB of on-chip SRAM, operates at 237 frames/s (FPS), and consumes 0.9 mW from 0.6 V/0.8 V supplies. The corresponding energy per classification (3.8 mu J) amounts to a 40x improvement over the previous low-energy benchmark on CIFAR-10, achieved in part by sacrificing some programmability. The SC neuron array is 12.9x more energy efficient than a synthesized digital implementation, which amounts to a 4x advantage in system-level energy per classification.

引用

页码：158 / 172

页数：15

共 36 条

[1]

Ando K, 2017, SYMP VLSI CIRCUITS, pC24, DOI 10.23919/VLSIC.2017.8008533

[2]

[Anonymous], 2016, BINARIZED NEURAL NET

[3]

[Anonymous], 2011, Lessons Learned From Manually Classi- fying CIFAR-10

[4]

[Anonymous], 2018 IEEE CUST INT C

[5]

[Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]

[6]

[Anonymous], 2017, COMMUN ACM, DOI DOI 10.1145/3065386

[7]

[Anonymous], 2014, CORR

[8]

[Anonymous], APPL MACH LEARN J

[9]

[Anonymous], APPL MACH LEARN J

[10]

[Anonymous], 2017, MOBILENETS EFFICIENT

← 1 2 3 4 →