O-1: Self-training with Oracle and 1-best Hypothesis

被引:0
作者
Baskar, Murali Karthick [1 ]
Rosenberg, Andrew [1 ]
Ramabhadran, Bhuvana [1 ]
Audhkhasi, Kartik [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
INTERSPEECH 2023 | 2023年
关键词
Self-training; EMBR; O-1; ASR; speech recognition; discriminative training;
D O I
10.21437/Interspeech.2023-2166
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew datasets and a large-scale, inhouse data set. On Speechstew, the O-1 objective closes the gap between the actual and oracle performance by 80% relative compared to EMBR which bridges the gap by 43% relative. O-1 achieves 13% to 25% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset. Overall, O-1 results in a 9% relative improvement in WER over EMBR, thereby speaking to the scalability of the proposed objective for large-scale datasets.
引用
收藏
页码:77 / 81
页数:5
相关论文
共 44 条
  • [1] Arun Abhishek, 2010, P STAT MACH TRANSL M, P365
  • [2] Baevski Alexei, 2020, Advances in NeurIPS
  • [3] Bahdanau D, 2017, Arxiv, DOI arXiv:1607.07086
  • [4] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
  • [5] Ask2Mask: Guided Data Selection for Masked Speech Modeling
    Baskar, Murali Karthick
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Zhang, Yu
    Moreno, Pedro
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1357 - 1366
  • [6] Baskar MK, 2019, INT CONF ACOUST SPEE, P5646, DOI [10.1109/ICASSP.2019.8682782, 10.1109/icassp.2019.8682782]
  • [7] Bengio S, 2015, ADV NEUR IN, V28
  • [8] Chan W., 2021, arXiv, DOI DOI 10.48550/ARXIV.2104.02133
  • [9] Collobert R, 2019, PR MACH LEARN RES, V97
  • [10] Reducing Exposure Bias in Training Recurrent Neural Network Transducers
    Cui, Xiaodong
    Kingsbury, Brian
    Saon, George
    Haws, David
    Tuske, Zoltan
    [J]. INTERSPEECH 2021, 2021, : 1802 - 1806