Weighted-Entropy-based Quantization for Deep Neural Networks

被引：151

作者：

Park, Eunhyeok ^{[1
]}

Ahn, Junwhan ^{[2
]}

Yoo, Sungjoo ^{[1
]}

机构：

[1] Seoul Natl Univ, Comp & Memory Architecture Lab, Seoul, South Korea

[2] Seoul Natl Univ, Design Automat Lab, Seoul, South Korea

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR.2017.761

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Quantization is considered as one of the most effective methods to optimize the inference cost of neural network models for their deployment to mobile and embedded systems, which have tight resource constraints. In such approaches, it is critical to provide low-cost quantization under a tight accuracy loss constraint (e.g., 1%). In this paper, we propose a novel method for quantizing weights and activations based on the concept of weighted entropy. Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy. This facilitates much more flexible exploitation of accuracy-performance trade-off provided by different levels of quantization. Moreover, our scheme provides an automated quantization flow based on conventional training algorithms, which greatly reduces the design-time effort to quantize the network. According to our extensive evaluations based on practical neural network models for image classification (AlexNet, GoogLeNet and ResNet-50/101), object detection (R-FCN with ResNet-50), and language modeling (an LSTM network), our method achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Also, compared to existing quantization schemes, ours provides higher accuracy with a similar resource constraint and requires much lower design effort.

引用

页码：7197 / 7205

页数：9

共 23 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

[Anonymous], 2016, ARXIV160506409

[3]

[Anonymous], 1993, COMPUT LINGUIST, DOI DOI 10.21236/ADA273556

[4]

[Anonymous], 2011, IMPROVING SPEED NEUR

[5]

[Anonymous], IEEE T PATTERN ANAL

[6]

[Anonymous], CORR

[7]

[Anonymous], 2016, P C ASS MACH TRANSL

[8]

[Anonymous], IEEE COMPUTER ARCHIT

[9]

Courbariaux M, 2015, ADV NEUR IN, V28

[10]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 →