UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

被引：72

作者：

Baskin, Chaim ^{[1
]}

Liss, Natan ^{[1
]}

Schwartz, Eli ^{[2
]}

Zheltonozhskii, Evgenii ^{[1
]}

Giryes, Raja ^{[2
]}

Bronstein, Alex M. ^{[1
]}

Mendelson, Avi ^{[1
]}

机构：

[1] Technion, Dept Comp Sci, CS Taub Bldg, IL-3200003 Haifa, Israel

[2] Tel Aviv Univ, Sch Elect Engn, POB 39040, IL-6997801 Tel Aviv, Israel

来源：

ACM TRANSACTIONS ON COMPUTER SYSTEMS | 2021年 / 37卷 / 1-4期

关键词：

Deep learning; neural networks; quantization; efficient deep learning;

D O I：

10.1145/3444943

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present a novel method for neural network quantization. Our method, named UNIQ, emulates a non-uniform k-quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.

引用

页码：1 / 4

页数：15

共 34 条

[21]

Louizos C., 2017, P 31 C NEUR INF PROC, P3288

[22]

Mishra Asit, 2018, P INT C LEARN REPR I

[23]

Molchanov Dmitry, 2017, P INT C MACH LEARN

[24]

Park Eunhyeok, 2018, P EUR C COMP VIS ECC

[25]

Polino Antonio, 2018, P INT C LEARN REPR I

[26] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks [J].

Rastegari, Mohammad ;

Ordonez, Vicente ;

Redmon, Joseph ;

Farhadi, Ali .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :525-542

[27] ImageNet Large Scale Visual Recognition Challenge [J].

Russakovsky, Olga ;

Deng, Jia ;

Su, Hao ;

Krause, Jonathan ;

Satheesh, Sanjeev ;

Ma, Sean ;

Huang, Zhiheng ;

Karpathy, Andrej ;

Khosla, Aditya ;

Bernstein, Michael ;

Berg, Alexander C. ;

Fei-Fei, Li .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) :211-252

[28] Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks [J].

Sharma, Hardik ;

Park, Jongse ;

Suda, Naveen ;

Lai, Liangzhen ;

Chau, Benson ;

Chandra, Vikas ;

Esmaeilzadeh, Hadi .

2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, :764-775

[29] A Quantization-Friendly Separable Convolution for MobileNets [J].

Sheng, Tao ;

Feng, Chen ;

Zhuo, Shaojie ;

Zhang, Xiaopeng ;

Shen, Liang ;

Aleksic, Mickey .

2018 1ST WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING FOR EMBEDDED APPLICATIONS (EMC2), 2018, :14-18

[30]

Ullrich Karen, 2017, P INT C LEARN REPR I

← 1 2 3 4 →