Noisy Student Teacher Training with Self Supervised Learning for Children ASR

被引:1
|
作者
Chaturvedi, Shreya S. [1 ]
Sailor, Hardik B. [2 ,3 ]
Patil, Hemant A. [1 ]
机构
[1] DA IICT, Speech Res Lab, Gandhinagar, India
[2] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[3] Samsung R&D Inst, Bangalore, Karnataka, India
来源
2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM | 2022年
关键词
D O I
10.1109/SPCOM55316.2022.9840763
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic Speech Recognition (ASR) is a fast-growing field, where reliable systems are made for high resource languages and for adult's speech. However, performance of such ASR system is inefficient for children speech, due to numerous acoustic variability in children speech and scarcity of resources. In this paper, we propose to use the unlabeled data extensively to develop ASR system for low resourced children speech. State-of-the-art wav2vec 2.0 is the baseline ASR technique used here. The baseline's performance is further enhanced with the intuition of Noisy Student Teacher (NST) learning. The proposed technique is not only limited to introducing the use of soft labels (i.e., word-level transcription) of unlabeled data, but also adapts the learning of teacher model or preceding student model, which results in reduction of the redundant training significantly. To that effect, a detailed analysis is reported in this paper, as there is a difference in teacher and student learning. In ASR experiments, character-level tokenization was used and hence, Connectionist Temporal Classification (CTC) loss was used for fine-tuning. Due to computational limitations, experiments are performed with approximately 12 hours of training, and 5 hours of development and test data was used from standard My Science Tutor (My-ST) corpus. The baseline wav2vec 2.0 achieves 34% WER, while relatively 10% of performance was improved using the proposed approach. Further, the analysis of performance loss and effect of language model is discussed in details.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Dynamics of supervised learning with restricted training sets and noisy teachers
    Coolen, ACC
    Mace, CWH
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 237 - 243
  • [22] Complementary Mask Self-Supervised Pre-training Based on Teacher-Student Network
    Ye, Shaoxiong
    Huang, Jing
    Zhu, Lifu
    2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS, 2023, : 199 - 206
  • [23] Semi-supervised ASR by End-to-end Self-training
    Chen, Yang
    Wang, Weiran
    Wang, Chao
    INTERSPEECH 2020, 2020, : 2787 - 2791
  • [24] Student-Teacher Learning from Clean Inputs to Noisy Inputs
    Hong, Guanzhe
    Mao, Zhiyuan
    Lin, Xiaojun
    Chan, Stanley H.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12070 - 12079
  • [25] Self-Adaptive Training: Bridging Supervised and Self-Supervised Learning
    Huang, Lang
    Zhang, Chao
    Zhang, Hongyang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1362 - 1377
  • [26] JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR
    Bai, Junwen
    Li, Bo
    Zhang, Yu
    Bapna, Ankur
    Siddhartha, Nikhil
    Sim, Khe Chai
    Sainath, Tara N.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6402 - 6406
  • [27] ASR ERROR CORRECTION WITH DUAL-CHANNEL SELF-SUPERVISED LEARNING
    Zhang, Fan
    Tu, Mei
    Liu, Song
    Yan, Jinyao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7282 - 7286
  • [28] Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning
    Ke, Zhanghan
    Wang, Daoye
    Yan, Qiong
    Ren, Jimmy
    Lau, Rynson W. H.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6727 - 6735
  • [29] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
    Arunkumar, A.
    Umesh, S.
    INTERSPEECH 2022, 2022, : 3418 - 3422
  • [30] Evaluating the Efficiency of Student Sports Training Based on Supervised Learning
    Kewei S.
    Díaz V.G.
    Kadry S.N.
    International Journal of Technology and Human Interaction, 2022, 18 (02)