Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

被引:306
作者
Gong, Ruihao [1 ,2 ]
Liu, Xianglong [1 ]
Jiang, Shenghu [1 ,2 ]
Li, Tianxiang [2 ,3 ]
Hu, Peng [2 ]
Lin, Jiazhen [2 ]
Yu, Fengwei [2 ]
Yan, Junjie [2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] SenseTime Grp Ltd, Hong Kong, Peoples R China
[3] Beijing Inst Technol, Beijing, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV.2019.00495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7x speed up, compared with the open-source 8-bit high-performance inference framework NCNN [31].
引用
收藏
页码:4851 / 4860
页数:10
相关论文
共 46 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2015, BinaryConnect: Training Deep Neural Networks with binary weights during propagations
[3]  
[Anonymous], AAAI
[4]  
[Anonymous], 2018, P 1 REPR QUAL EFF SY
[5]  
[Anonymous], 2013, CoRR abs/1308.3432
[6]  
[Anonymous], 2016, ABS160606160 ARXIV
[7]  
[Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124
[8]  
[Anonymous], 2017, NCNN
[9]  
[Anonymous], 2017, 31 C NEURAL INFORM P
[10]  
Banner R, 2018, ARXIV181005723