Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

被引：14

作者：

Weng, Chao ^{[1
]}

Yu, Chengzhu ^{[1
]}

Cui, Jia ^{[1
]}

Zhang, Chunlei ^{[1
]}

Yu, Dong ^{[1
]}

机构：

[1] Tencent AI Lab, Bellevue, WA 98004 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

RNN-T; transformer; end-to-end speech recognition; sequential minimum Bayes risk training; MBR; shallow fusion; LVCSR;

D O I：

10.21437/Interspeech.2020-1221

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-the-fly generated N-best hypothesis. We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM. Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM. Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2% and 0.5% on read and spontaneous Mandarin speech respectively over a strong convolution and transformer based RNN-T baseline trained on similar to 21,000 hours of speech.

引用

页码：966 / 970

页数：5

共 36 条

[11]

Dauphin Yann N, 2016, ICLR

[12]

Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506

[13]

Graves A, 2013, Speech recognition with deep recurrent neural networks, V38, P6645

[14]

Graves A. J., 2012, ICML

[15]

Hannun A., 2014, ARXIV14125567

[16]

He YZ, 2019, INT CONF ACOUST SPEE, P6381, DOI [10.1109/ICASSP.2019.8682336, 10.1109/icassp.2019.8682336]

[17]

Heigold Georg, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5587, DOI 10.1109/ICASSP.2014.6854672

[18]

Li J., 2019, ARXIV VOL ABS 1909 1

[19]

Peddinti V, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3214

[20]

Povey D., 2011, IEEE AUT SPEECH REC

← 1 2 3 4 →