O-1: Self-training with Oracle and 1-best Hypothesis

被引：0

作者：

Baskar, Murali Karthick ^{[1
]}

Rosenberg, Andrew ^{[1
]}

Ramabhadran, Bhuvana ^{[1
]}

Audhkhasi, Kartik ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

Self-training; EMBR; O-1; ASR; speech recognition; discriminative training;

D O I：

10.21437/Interspeech.2023-2166

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew datasets and a large-scale, inhouse data set. On Speechstew, the O-1 objective closes the gap between the actual and oracle performance by 80% relative compared to EMBR which bridges the gap by 43% relative. O-1 achieves 13% to 25% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset. Overall, O-1 results in a 9% relative improvement in WER over EMBR, thereby speaking to the scalability of the proposed objective for large-scale datasets.

引用

页码：77 / 81

页数：5

共 44 条

[1] Arun Abhishek, 2010, P STAT MACH TRANSL M, P365
[2] Baevski Alexei, 2020, Advances in NeurIPS
[3] Bahdanau D, 2017, Arxiv, DOI arXiv:1607.07086
[4] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[5] Ask2Mask: Guided Data Selection for Masked Speech Modeling
Baskar, Murali Karthick
Rosenberg, Andrew
Ramabhadran, Bhuvana
Zhang, Yu
Moreno, Pedro
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1357 - 1366
[6] Baskar MK, 2019, INT CONF ACOUST SPEE, P5646, DOI [10.1109/ICASSP.2019.8682782, 10.1109/icassp.2019.8682782]
[7] Bengio S, 2015, ADV NEUR IN, V28
[8] Chan W., 2021, arXiv, DOI DOI 10.48550/ARXIV.2104.02133
[9] Collobert R, 2019, PR MACH LEARN RES, V97
[10] Reducing Exposure Bias in Training Recurrent Neural Network Transducers
Cui, Xiaodong
Kingsbury, Brian
Saon, George
Haws, David
Tuske, Zoltan
[J]. INTERSPEECH 2021, 2021, : 1802 - 1806

← 1 2 3 4 5 →