LEARNING EFFICIENT SPARSE STRUCTURES IN SPEECH RECOGNITION

被引:0
作者
Zhang, Jingchi [1 ]
Wen, Wei [1 ]
Deisher, Michael [2 ]
Cheng, Hsin-Pai [1 ]
Li, Hai [1 ]
Chen, Yiran [1 ]
机构
[1] Duke Univ, Durham, NC 27708 USA
[2] Intel Corp, Hillsboro, OR USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
基金
美国国家科学基金会;
关键词
efficient structural sparsity; long short-term memory; acoustic modeling; speech recognition;
D O I
10.1109/icassp.2019.8683620
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recurrent neural networks (RNNs), especially long short-term memories (LSTMs) have been widely used in speech recognition and natural language processing. As the sizes of RNN models grow for better performance, the computation cost and therefore the required hardware resource increase rapidly. We propose an efficient structural sparsity (ESS) learning method for acoustic modeling in speech recognition. ESS aims to generate a model that offers higher execution efficiency while maintaining the accuracy. A three-step training pipeline is developed in our work. First, we apply the group Lasso regularization method during training process and learn a structural sparse model from scratch. Then the learned sparse structures will be fixed and cannot be changed. Finally, we retrain the model and update the nonzero parameters in the model. We applied our ESS method on classic HMM+LSTM model on Kaldi toolkit. The experimental results show that ESS can remove 72.5% weight groups in the weight matrices when slightly increasing the word error rate (WER) 1.1%.
引用
收藏
页码:2717 / 2721
页数:5
相关论文
共 12 条
[1]  
Deisher M., 2017, IMPLEMENTATION EFFIC
[2]  
Graves A., 2013, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2013.6638947
[3]  
Graves A, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P273, DOI 10.1109/ASRU.2013.6707742
[4]  
Han S., 2015, NEURIPS, DOI DOI 10.5555/2969239.2969366
[5]  
Lu HY, 2015, PROC CVPR IEEE, P806, DOI 10.1109/CVPR.2015.7298681
[6]  
Narang S., 2017, INT C LEARN REPR
[7]  
Povey D., 2011, IEEE 2011 WORKSHOP A, P1
[8]  
Rousseau A, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3935
[9]  
Sak H, 2014, INTERSPEECH, P338
[10]  
Stemmer G, 2017, INTERSPEECH, P2036