PV-MAC: Multiply-and-accumulate unit structure exploiting precision variability in on-device convolutional neural networks

被引：5

作者：

Kang, Jongsung ^{[1
]}

Kim, Taewhan ^{[1
]}

机构：

[1] Seoul Natl Univ, Sch Elect & Comp Engn, Seoul, South Korea

来源：

INTEGRATION-THE VLSI JOURNAL | 2020年 / 71卷

基金：

新加坡国家研究基金会;

关键词：

Convolutional neural networks; Multiply-accumulate unit; On-device inference; Precision variability;

D O I：

10.1016/j.vlsi.2019.11.003

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The work proposes a new multiply-and-accumulate (MAC) processing unit structure that is highly suitable for on-device convolutional neural networks (CNNs). By observing that the bit-lengths to represent the numerical values of the input/output neurons and weight parameters in on-device CNNs should be small (i.e., low precisions), usually no more than 9 bits, and vary across network layers, we propose a layer-by-layer composable MAC unit structure that is best suited to the `majority' of the operations with low precisions through a maximal parallelism of the MAC operations in the unit with very little subsidiary processing overhead while being sufficiently effective in MAC unit resource utilization for the rest of operations. Precisely, two essences of this work are: (1) our MAC unit structure supports two operation modes, (mode-0) operating a single multiplier for every majority multiplication of low precisions and (mode-1) operating multiple Ca minimal number of) multipliers for the rest of multiplications of high precisions; (2) for a set of input CNNs, we formulate the exploration of the size of a single internal multiplier in MAC unit to derive an 'economical' instance, in terms of computation and energy cost, of MAC unit structure across the whole network layers. Our strategy is in a strong contrast with the conventional MAC unit design, in which the MAC input size should be large enough to cover the largest bit-size of the activation inputs/outputs and weight parameters. We show analytically and empirically that our MAC unit structure with the exploration of its instances is very effective, reducing computation cost per multiplication operation by 4.68 similar to 30.3% and saving energy cost by 43.3% on average for the convolutional operations in AlexNet and VGG-16 over the use of the conventional MAC unit structures.

引用

页码：76 / 85

页数：10

共 20 条

[1]

[Anonymous], DEEP LEARNING ARM CO

[2]

[Anonymous], IEEE AS PAC C POSTGR

[3]

[Anonymous], ARXIV170203044CS

[4]

[Anonymous], 2018, SHOCK VIB

[5]

[Anonymous], ARXIV151106393CS

[6]

[Anonymous], ARXIV171208934CS

[7]

[Anonymous], ARXIV160403168CS

[8]

Anwar S, 2015, INT CONF ACOUST SPEE, P1131, DOI 10.1109/ICASSP.2015.7178146

[9]

Nguyen D, 2017, DES AUT TEST EUROPE, P890, DOI 10.23919/DATE.2017.7927113

[10]

Gupta S, 2015, PR MACH LEARN RES, V37, P1737

← 1 2 →