Weighted-Entropy-based Quantization for Deep Neural Networks

被引:151
作者
Park, Eunhyeok [1 ]
Ahn, Junwhan [2 ]
Yoo, Sungjoo [1 ]
机构
[1] Seoul Natl Univ, Comp & Memory Architecture Lab, Seoul, South Korea
[2] Seoul Natl Univ, Design Automat Lab, Seoul, South Korea
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR.2017.761
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantization is considered as one of the most effective methods to optimize the inference cost of neural network models for their deployment to mobile and embedded systems, which have tight resource constraints. In such approaches, it is critical to provide low-cost quantization under a tight accuracy loss constraint (e.g., 1%). In this paper, we propose a novel method for quantizing weights and activations based on the concept of weighted entropy. Unlike recent work on binary-weight neural networks, our approach is multi-bit quantization, in which weights and activations can be quantized by any number of bits depending on the target accuracy. This facilitates much more flexible exploitation of accuracy-performance trade-off provided by different levels of quantization. Moreover, our scheme provides an automated quantization flow based on conventional training algorithms, which greatly reduces the design-time effort to quantize the network. According to our extensive evaluations based on practical neural network models for image classification (AlexNet, GoogLeNet and ResNet-50/101), object detection (R-FCN with ResNet-50), and language modeling (an LSTM network), our method achieves significant reductions in both the model size and the amount of computation with minimal accuracy loss. Also, compared to existing quantization schemes, ours provides higher accuracy with a similar resource constraint and requires much lower design effort.
引用
收藏
页码:7197 / 7205
页数:9
相关论文
共 23 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2016, ARXIV160506409
[3]  
[Anonymous], 1993, COMPUT LINGUIST, DOI DOI 10.21236/ADA273556
[4]  
[Anonymous], 2011, IMPROVING SPEED NEUR
[5]  
[Anonymous], IEEE T PATTERN ANAL
[6]  
[Anonymous], CORR
[7]  
[Anonymous], 2016, P C ASS MACH TRANSL
[8]  
[Anonymous], IEEE COMPUTER ARCHIT
[9]  
Courbariaux M, 2015, ADV NEUR IN, V28
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848