Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

被引:10
|
作者
Dey, Subhadeep [1 ]
Motlicek, Petr [1 ]
Bui, Trung [2 ]
Dernoncourt, Franck [2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Adobe Res, San Jose, CA USA
来源
关键词
speech recognition; semi-supervised learning; end-to-end ASR; dropout; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-3246
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we explore various approaches for semi-supervised learning in an end-to-end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe's internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [31] Tic action recognition for children tic disorder with end-to-end video semi-supervised learning
    Wang, Xiangyang
    Yang, Kun
    Ding, Qiang
    Wang, Rui
    Sun, Jinhua
    VISUAL COMPUTER, 2025,
  • [32] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [33] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
    Kahn, Jacob
    Lee, Ann
    Hannun, Awni
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
  • [34] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
    Sun, Sining
    Guo, Pengcheng
    Xie, Lei
    Hwang, Mei-Yuh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
  • [35] Speech-and-Text Transformer: Exploiting Unpaired Text for End-to-End Speech Recognition
    Wang, Qinyi
    Zhou, Xinyuan
    Li, Haizhou
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
  • [36] End-to-End Speech Recognition Sequence Training With Reinforcement Learning
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE ACCESS, 2019, 7 : 79758 - 79769
  • [37] Improved training for online end-to-end speech recognition systems
    Kim, Suyoun
    Seltzer, Michael L.
    Li, Jinyu
    Zhao, Rui
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2913 - 2917
  • [38] SEQUENCE NOISE INJECTED TRAINING FOR END-TO-END SPEECH RECOGNITION
    Saon, George
    Tuske, Zoltan
    Audhkhasi, Kartik
    Kingsbury, Brian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6261 - 6265
  • [39] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
    Hori, Takaaki
    Astudillo, Ramon
    Hayashi, Tomoki
    Zhang, Yu
    Watanabe, Shinji
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275
  • [40] Multitask Training with Text Data for End-to-End Speech Recognition
    Wang, Peidong
    Sainath, Tara N.
    Weiss, Ron J.
    INTERSPEECH 2021, 2021, : 2566 - 2570