Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

被引:13
作者
Weng, Chao [1 ]
Yu, Chengzhu [1 ]
Cui, Jia [1 ]
Zhang, Chunlei [1 ]
Yu, Dong [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
来源
INTERSPEECH 2020 | 2020年
关键词
RNN-T; transformer; end-to-end speech recognition; sequential minimum Bayes risk training; MBR; shallow fusion; LVCSR;
D O I
10.21437/Interspeech.2020-1221
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-the-fly generated N-best hypothesis. We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM. Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM. Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2% and 0.5% on read and spontaneous Mandarin speech respectively over a strong convolution and transformer based RNN-T baseline trained on similar to 21,000 hours of speech.
引用
收藏
页码:966 / 970
页数:5
相关论文
共 36 条
[11]  
Dauphin Yann N, 2016, ICLR
[12]  
Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
[13]  
Graves A, 2013, Speech recognition with deep recurrent neural networks, V38, P6645
[14]  
Graves A. J., 2012, ICML
[15]  
Hannun A., 2014, ARXIV14125567
[16]  
He YZ, 2019, INT CONF ACOUST SPEE, P6381, DOI [10.1109/ICASSP.2019.8682336, 10.1109/icassp.2019.8682336]
[17]  
Heigold Georg, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5587, DOI 10.1109/ICASSP.2014.6854672
[18]  
Li J., 2019, ARXIV VOL ABS 1909 1
[19]  
Peddinti V, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3214
[20]  
Povey D., 2011, IEEE AUT SPEECH REC