Training Integer-Only Deep Recurrent Neural Networks

被引:0
作者
Nia V.P. [1 ,2 ]
Sari E. [1 ]
Courville V. [1 ]
Asgharian M. [3 ]
机构
[1] Huawei Noah’s Ark Lab, Montreal Research Centre, 7101 Park Avenue, Montreal, H3N 1X9, QC
[2] Department of Mathematics and Industrial Engineering, Polytechnique Montreal, 2500 Chem. Polytechnique, Montreal, H3T 1J4, QC
[3] Department of Mathematics and Statistics, McGill University, 805 Sherbrooke Street West, Montreal, H3A 0B9, QC
关键词
ASR; LSTM; Model compression; NLP; Quantization; Recurrent neural network;
D O I
10.1007/s42979-023-01920-z
中图分类号
学科分类号
摘要
Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with 2 × improvement in runtime, and 4 × reduction in model size while maintaining similar accuracy as its full-precision counterpart. © 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 46 条
[1]  
Rumelhart D., Hinton G.E., Williams R.J., Learning Internal Representations by Error Propagation, (1986)
[2]  
Hochreiter S., Schmidhuber J., Long short-term memory, Neural Comput, 9, 8, pp. 1735-1780, (1997)
[3]  
Cho K., van Merrienboer B., Bahdanau D., Bengio Y., On the properties of neural machine translation: Encoder-decoder approaches, . Arxiv Preprint Arxiv, 1409, (2014)
[4]  
Chen M.X., Firat O., Bapna A., Johnson M., Macherey W., Foster G., Jones L., Schuster M., Shazeer N., Parmar N., Vaswani A., Uszkoreit J., Kaiser L., Chen Z., Wu Y., Hughes M., The best of both worlds: Combining recent advances in neural machine translation, Proceedings of the 56Th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers, pp. 76-86
[5]  
Wang C., Wu S., Liu S., Accelerating transformer decoding via a hybrid of self-attention and recurrent neural network, (2019)
[6]  
Zhang L., Wang S., Liu B., Deep learning for sentiment analysis: A survey
[7]  
You Q., Jin H., Wang Z., Fang C., Luo J., Image captioning with semantic attention, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016)
[8]  
Mao J., Xu W., Yang Y., Wang J., Huang Z., Yuille A., Deep Captioning with Multimodal Recurrent Neural Networks (M-Rnn), (2014)
[9]  
Streaming end-to-end speech recognition for mobile devices, In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6381-6385, (2019)
[10]  
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L.U, Polosukhin I. Attention is all you need, Advances in Neural Information Processing Systems 30, pp. 5998-6008, (2017)