Learning When to Trust Which Teacher forWeakly Supervised ASR

被引:0
|
作者
Agrawal, Aakriti [1 ,2 ]
Rao, Milind [2 ]
Sahu, Anit Kumar [2 ]
Chennupati, Gopinath [2 ]
Stolcke, Andreas [2 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Amazon Alexa AI, Bellevue, WA USA
来源
INTERSPEECH 2023 | 2023年
关键词
ASR; teacher-student training; semi-supervised learning; self-supervised learning; ROVER;
D O I
10.21437/Interspeech.2023-2205
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by the expert teachers. In this paper, we exploit supervision from multiple domain experts in training student ASR models. This training strategy is especially useful in scenarios where few or no human transcriptions are available. To that end, we propose a Smart-Weighter mechanism that selects an appropriate expert based on the input audio, and then trains the student model in an unsupervised setting. We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.
引用
收藏
页码:381 / 385
页数:5
相关论文
共 50 条
  • [1] Noisy Student Teacher Training with Self Supervised Learning for Children ASR
    Chaturvedi, Shreya S.
    Sailor, Hardik B.
    Patil, Hemant A.
    2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
  • [2] Learning Representations forWeakly Supervised Natural Language Processing Tasks
    Huang, Fei
    Ahuja, Arun
    Downey, Doug
    Yang, Yi
    Guo, Yuhong
    Yates, Alexander
    COMPUTATIONAL LINGUISTICS, 2014, 40 (01) : 85 - 120
  • [3] All-pairs Consistency Learning forWeakly Supervised Semantic Segmentation
    Sun, Weixuan
    Zhang, Yanhao
    Qin, Zhen
    Liu, Zheyuan
    Cheng, Lin
    Wang, Fanyi
    Zhong, Yiran
    Barnes, Nick
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 826 - 837
  • [4] CONTRASTIVE SEMI-SUPERVISED LEARNING FOR ASR
    Xiao, Alex
    Fuegen, Christian
    Mohamed, Abdelrahman
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3870 - 3874
  • [5] Topological Structure Learning forWeakly-Supervised Out-of-Distribution Detection
    He, Rundong
    Li, Rongxue
    Han, Zhongyi
    Yang, Xihong
    Yin, Yilong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4858 - 4866
  • [6] Biased Self-supervised learning for ASR
    Kreyssig, Florian L.
    Shi, Yangyang
    Guo, Jinxi
    Sari, Leda
    Mohamed, Abdelrahman
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 4948 - 4952
  • [7] Learning to Selectively Learn forWeakly Supervised Paraphrase Generation with Model-based Reinforcement Learning
    Yin, Haiyan
    Li, Dingcheng
    Li, Ping
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1385 - 1395
  • [8] On the Learning Dynamics of Semi-Supervised Training for ASR
    Wallington, Electra
    Kershenbaum, Benji
    Klejch, Ondrej
    Bell, Peter
    INTERSPEECH 2021, 2021, : 716 - 720
  • [9] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
    Zhang, Zi-qiang
    Song, Yan
    Zhang, Jian-shu
    McLoughlin, Ian
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 3580 - 3584
  • [10] LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASR
    Wong, Jeremy H. M.
    Gales, Mark J. F.
    Wang, Yu
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 93 - 99