Learning When to Trust Which Teacher forWeakly Supervised ASR

被引：0

作者：

Agrawal, Aakriti ^{[1
,2
]}

Rao, Milind ^{[2
]}

Sahu, Anit Kumar ^{[2
]}

Chennupati, Gopinath ^{[2
]}

Stolcke, Andreas ^{[2
]}

机构：

[1] Univ Maryland, College Pk, MD 20742 USA

[2] Amazon Alexa AI, Bellevue, WA USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

ASR; teacher-student training; semi-supervised learning; self-supervised learning; ROVER;

D O I：

10.21437/Interspeech.2023-2205

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by the expert teachers. In this paper, we exploit supervision from multiple domain experts in training student ASR models. This training strategy is especially useful in scenarios where few or no human transcriptions are available. To that end, we propose a Smart-Weighter mechanism that selects an appropriate expert based on the input audio, and then trains the student model in an unsupervised setting. We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.

引用

页码：381 / 385

页数：5

共 50 条

[1] Noisy Student Teacher Training with Self Supervised Learning for Children ASR
Chaturvedi, Shreya S.
Sailor, Hardik B.
Patil, Hemant A.
2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
[2] Learning Representations forWeakly Supervised Natural Language Processing Tasks
Huang, Fei
Ahuja, Arun
Downey, Doug
Yang, Yi
Guo, Yuhong
Yates, Alexander
COMPUTATIONAL LINGUISTICS, 2014, 40 (01) : 85 - 120
[3] All-pairs Consistency Learning forWeakly Supervised Semantic Segmentation
Sun, Weixuan
Zhang, Yanhao
Qin, Zhen
Liu, Zheyuan
Cheng, Lin
Wang, Fanyi
Zhong, Yiran
Barnes, Nick
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 826 - 837
[4] CONTRASTIVE SEMI-SUPERVISED LEARNING FOR ASR
Xiao, Alex
Fuegen, Christian
Mohamed, Abdelrahman
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3870 - 3874
[5] Topological Structure Learning forWeakly-Supervised Out-of-Distribution Detection
He, Rundong
Li, Rongxue
Han, Zhongyi
Yang, Xihong
Yin, Yilong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4858 - 4866
[6] Biased Self-supervised learning for ASR
Kreyssig, Florian L.
Shi, Yangyang
Guo, Jinxi
Sari, Leda
Mohamed, Abdelrahman
Woodland, Philip C.
INTERSPEECH 2023, 2023, : 4948 - 4952
[7] Learning to Selectively Learn forWeakly Supervised Paraphrase Generation with Model-based Reinforcement Learning
Yin, Haiyan
Li, Dingcheng
Li, Ping
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1385 - 1395
[8] On the Learning Dynamics of Semi-Supervised Training for ASR
Wallington, Electra
Kershenbaum, Benji
Klejch, Ondrej
Bell, Peter
INTERSPEECH 2021, 2021, : 716 - 720
[9] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
Zhang, Zi-qiang
Song, Yan
Zhang, Jian-shu
McLoughlin, Ian
Dai, Li-Rong
INTERSPEECH 2020, 2020, : 3580 - 3584
[10] LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASR
Wong, Jeremy H. M.
Gales, Mark J. F.
Wang, Yu
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 93 - 99

← 1 2 3 4 5 →