Noisy Student Teacher Training with Self Supervised Learning for Children ASR

被引:1
|
作者
Chaturvedi, Shreya S. [1 ]
Sailor, Hardik B. [2 ,3 ]
Patil, Hemant A. [1 ]
机构
[1] DA IICT, Speech Res Lab, Gandhinagar, India
[2] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore
[3] Samsung R&D Inst, Bangalore, Karnataka, India
来源
2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM | 2022年
关键词
D O I
10.1109/SPCOM55316.2022.9840763
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic Speech Recognition (ASR) is a fast-growing field, where reliable systems are made for high resource languages and for adult's speech. However, performance of such ASR system is inefficient for children speech, due to numerous acoustic variability in children speech and scarcity of resources. In this paper, we propose to use the unlabeled data extensively to develop ASR system for low resourced children speech. State-of-the-art wav2vec 2.0 is the baseline ASR technique used here. The baseline's performance is further enhanced with the intuition of Noisy Student Teacher (NST) learning. The proposed technique is not only limited to introducing the use of soft labels (i.e., word-level transcription) of unlabeled data, but also adapts the learning of teacher model or preceding student model, which results in reduction of the redundant training significantly. To that effect, a detailed analysis is reported in this paper, as there is a difference in teacher and student learning. In ASR experiments, character-level tokenization was used and hence, Connectionist Temporal Classification (CTC) loss was used for fine-tuning. Due to computational limitations, experiments are performed with approximately 12 hours of training, and 5 hours of development and test data was used from standard My Science Tutor (My-ST) corpus. The baseline wav2vec 2.0 achieves 34% WER, while relatively 10% of performance was improved using the proposed approach. Further, the analysis of performance loss and effect of language model is discussed in details.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
    Fan, Ruchao
    Alwan, Abeer
    INTERSPEECH 2022, 2022, : 4900 - 4904
  • [32] SEMI-SUPERVISED SINGING VOICE SEPARATION WITH NOISY SELF-TRAINING
    Wang, Zhepei
    Giri, Ritwik
    Isik, Umut
    Valin, Jean-Marc
    Krishnaswamy, Arvindh
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 31 - 35
  • [33] CONTRASTIVE SEMI-SUPERVISED LEARNING FOR ASR
    Xiao, Alex
    Fuegen, Christian
    Mohamed, Abdelrahman
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3870 - 3874
  • [34] Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
    Cao, Songjun
    Kang, Yueteng
    Fu, Yanzhe
    Xu, Xiaoshuo
    Sun, Sining
    Zhang, Yike
    Ma, Long
    INTERSPEECH 2021, 2021, : 706 - 710
  • [35] The student-teacher framework guided by self-training and consistency regularization for semi-supervised medical image segmentation
    Li, Boliang
    Xu, Yaming
    Wang, Yan
    Li, Luxiu
    Zhang, Bo
    PLOS ONE, 2024, 19 (04):
  • [36] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
    Long, Yanhua
    Li, Yijie
    Wei, Shuang
    Zhang, Qiaozheng
    Yang, Chunxia
    IEEE ACCESS, 2019, 7 : 133615 - 133627
  • [37] Perceptive Self-Supervised Learning Network for Noisy Image Watermark Removal
    Tian, Chunwei
    Zheng, Menghua
    Li, Bo
    Zhang, Yanning
    Zhang, Shichao
    Zhang, David
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7069 - 7079
  • [38] Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches
    Sanada, Yutaro
    Nakagawa, Takumi
    Wada, Yuichiro
    Takanashi, Kosaku
    Zhang, Yuhui
    Tokuyama, Kiichi
    Kanamori, Takafumi
    Yamada, Tomonori
    INTERSPEECH 2022, 2022, : 1178 - 1182
  • [39] Self-Supervised Learning in the Twilight of Noisy Real-World Datasets
    Tendle, Atharva
    Little, Andrew
    Scott, Stephen
    Hasan, Mohammad Rashedul
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 461 - 464
  • [40] A Novel Self-Supervised Re-labeling Approach for Training with Noisy Labels
    Mandal, Devraj
    Bharadwaj, Shrisha
    Biswas, Soma
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1370 - 1379