Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

被引：14

作者：

Weng, Chao ^{[1
]}

Yu, Chengzhu ^{[1
]}

Cui, Jia ^{[1
]}

Zhang, Chunlei ^{[1
]}

Yu, Dong ^{[1
]}

机构：

[1] Tencent AI Lab, Bellevue, WA 98004 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

RNN-T; transformer; end-to-end speech recognition; sequential minimum Bayes risk training; MBR; shallow fusion; LVCSR;

D O I：

10.21437/Interspeech.2020-1221

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition. Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-the-fly generated N-best hypothesis. We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM. Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM. Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2% and 0.5% on read and spontaneous Mandarin speech respectively over a strong convolution and transformer based RNN-T baseline trained on similar to 21,000 hours of speech.

引用

页码：966 / 970

页数：5

共 36 条

[1]

[Anonymous], 2016, Language modeling with gated convolutional networks

[2]

[Anonymous], 2003, Discriminaitve Training for Large Vocabulary Speech Recognition

[3]

Aravind CV, 2018, SPRINGERBRIEF ENERG, P1, DOI 10.1007/978-981-13-0435-4_1

[4]

Audhkhasi K., 2017, BUILDING COMPETITIVE

[5] Direct Acoustics-to-Word Models for English Conversational Speech Recognition [J].

Audhkhasi, Kartik ;

Ramabhadran, Bhuvana ;

Saon, George ;

Picheny, Michael ;

Nahamoo, David .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :959-963

[6]

Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)

[7]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[8]

Chen K, 2016, INT CONF ACOUST SPEE, P5880, DOI 10.1109/ICASSP.2016.7472805

[9]

Chiu C.-C., 2017, State-of-the-art Speech Recognition With Sequence-to-Sequence Models

[10]

Chorowski J., 2015, Attention-based models for speech recognition

← 1 2 3 4 →