Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

被引:10
|
作者
Dey, Subhadeep [1 ]
Motlicek, Petr [1 ]
Bui, Trung [2 ]
Dernoncourt, Franck [2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Adobe Res, San Jose, CA USA
来源
关键词
speech recognition; semi-supervised learning; end-to-end ASR; dropout; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-3246
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we explore various approaches for semi-supervised learning in an end-to-end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe's internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [21] SEMI-SUPERVISED TRANSFER LEARNING FOR LANGUAGE EXPANSION OF END-TO-END SPEECH RECOGNITION MODELS TO LOW-RESOURCE LANGUAGES
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 984 - 988
  • [22] Semi-Supervised End-to-End Learning for Integrated Sensing and Communications
    Mateos-Ramos, Jose Miguel
    Chatelier, Baptiste
    Hager, Christian
    Keskin, Musa Furkan
    Le Magoarou, Luc
    Wymeersch, Henk
    2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 132 - 138
  • [23] GrowingNet: An end-to-end growing network for semi-supervised learning
    Zhang, Qifei
    Yu, Xiaomo
    COMPUTER COMMUNICATIONS, 2020, 151 : 208 - 215
  • [24] ACTIVEMATCH: END-TO-END SEMI-SUPERVISED ACTIVE REPRESENTATION LEARNING
    Yuan, Xinkai
    Li, Zilinghan
    Wang, Gaoang
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1136 - 1140
  • [25] End-to-End Semi-supervised Learning for Differentiable Particle Filters
    Wen, Hao
    Chen, Xiongjie
    Papagiannis, Georgios
    Hu, Conghui
    Li, Yunpeng
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 5825 - 5831
  • [26] End-to-End Semi-Supervised Learning for Video Action Detection
    Kumar, Akash
    Rawat, Yogesh Singh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14680 - 14690
  • [27] End-to-End Semi-Supervised Object Detection with Soft Teacher
    Xu, Mengde
    Zhang, Zheng
    Hu, Han
    Wang, Jianfeng
    Wang, Lijuan
    Wei, Fangyun
    Bai, Xiang
    Liu, Zicheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3040 - 3049
  • [28] Semi-Supervised Learning with Data Augmentation for End-to-End ASR
    Weninger, Felix
    Mana, Franco
    Gemello, Roberto
    Andres-Ferrer, Jesus
    Zhan, Puming
    INTERSPEECH 2020, 2020, : 2802 - 2806
  • [29] Framewise Supervised Training towards End-to-End Speech Recognition Models: First Results
    Li, Mohan
    Cao, Yuanjiang
    Zhou, Weicong
    Liu, Min
    INTERSPEECH 2019, 2019, : 1641 - 1645
  • [30] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569