Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

被引：10

作者：

Dey, Subhadeep ^{[1
]}

Motlicek, Petr ^{[1
]}

Bui, Trung ^{[2
]}

Dernoncourt, Franck ^{[2
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Adobe Res, San Jose, CA USA

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech recognition; semi-supervised learning; end-to-end ASR; dropout; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2019-3246

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we explore various approaches for semi-supervised learning in an end-to-end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe's internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.

引用

页码：734 / 738

页数：5

共 50 条

[31] Tic action recognition for children tic disorder with end-to-end video semi-supervised learning
Wang, Xiangyang
Yang, Kun
Ding, Qiang
Wang, Rui
Sun, Jinhua
VISUAL COMPUTER, 2025,
[32] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
Fu, Li
Li, Xiaoxiao
Wang, Runyu
Fan, Lu
Zhang, Zhengchen
Chen, Meng
Wu, Youzheng
He, Xiaodong
INTERSPEECH 2022, 2022, : 1006 - 1010
[33] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
Kahn, Jacob
Lee, Ann
Hannun, Awni
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
[34] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
Sun, Sining
Guo, Pengcheng
Xie, Lei
Hwang, Mei-Yuh
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
[35] Speech-and-Text Transformer: Exploiting Unpaired Text for End-to-End Speech Recognition
Wang, Qinyi
Zhou, Xinyuan
Li, Haizhou
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
[36] End-to-End Speech Recognition Sequence Training With Reinforcement Learning
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
IEEE ACCESS, 2019, 7 : 79758 - 79769
[37] Improved training for online end-to-end speech recognition systems
Kim, Suyoun
Seltzer, Michael L.
Li, Jinyu
Zhao, Rui
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2913 - 2917
[38] SEQUENCE NOISE INJECTED TRAINING FOR END-TO-END SPEECH RECOGNITION
Saon, George
Tuske, Zoltan
Audhkhasi, Kartik
Kingsbury, Brian
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6261 - 6265
[39] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
Hori, Takaaki
Astudillo, Ramon
Hayashi, Tomoki
Zhang, Yu
Watanabe, Shinji
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275
[40] Multitask Training with Text Data for End-to-End Speech Recognition
Wang, Peidong
Sainath, Tara N.
Weiss, Ron J.
INTERSPEECH 2021, 2021, : 2566 - 2570

← 1 2 3 4 5 →