Noisy Student Teacher Training with Self Supervised Learning for Children ASR

被引：1

作者：

Chaturvedi, Shreya S. ^{[1
]}

Sailor, Hardik B. ^{[2
,3
]}

Patil, Hemant A. ^{[1
]}

机构：

[1] DA IICT, Speech Res Lab, Gandhinagar, India

[2] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore

[3] Samsung R&D Inst, Bangalore, Karnataka, India

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM | 2022年

关键词：

D O I：

10.1109/SPCOM55316.2022.9840763

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Automatic Speech Recognition (ASR) is a fast-growing field, where reliable systems are made for high resource languages and for adult's speech. However, performance of such ASR system is inefficient for children speech, due to numerous acoustic variability in children speech and scarcity of resources. In this paper, we propose to use the unlabeled data extensively to develop ASR system for low resourced children speech. State-of-the-art wav2vec 2.0 is the baseline ASR technique used here. The baseline's performance is further enhanced with the intuition of Noisy Student Teacher (NST) learning. The proposed technique is not only limited to introducing the use of soft labels (i.e., word-level transcription) of unlabeled data, but also adapts the learning of teacher model or preceding student model, which results in reduction of the redundant training significantly. To that effect, a detailed analysis is reported in this paper, as there is a difference in teacher and student learning. In ASR experiments, character-level tokenization was used and hence, Connectionist Temporal Classification (CTC) loss was used for fine-tuning. Due to computational limitations, experiments are performed with approximately 12 hours of training, and 5 hours of development and test data was used from standard My Science Tutor (My-ST) corpus. The baseline wav2vec 2.0 achieves 34% WER, while relatively 10% of performance was improved using the proposed approach. Further, the analysis of performance loss and effect of language model is discussed in details.

引用

页数：5

共 50 条

[31] DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Fan, Ruchao
Alwan, Abeer
INTERSPEECH 2022, 2022, : 4900 - 4904
[32] SEMI-SUPERVISED SINGING VOICE SEPARATION WITH NOISY SELF-TRAINING
Wang, Zhepei
Giri, Ritwik
Isik, Umut
Valin, Jean-Marc
Krishnaswamy, Arvindh
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 31 - 35
[33] CONTRASTIVE SEMI-SUPERVISED LEARNING FOR ASR
Xiao, Alex
Fuegen, Christian
Mohamed, Abdelrahman
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3870 - 3874
[34] Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
Cao, Songjun
Kang, Yueteng
Fu, Yanzhe
Xu, Xiaoshuo
Sun, Sining
Zhang, Yike
Ma, Long
INTERSPEECH 2021, 2021, : 706 - 710
[35] The student-teacher framework guided by self-training and consistency regularization for semi-supervised medical image segmentation
Li, Boliang
Xu, Yaming
Wang, Yan
Li, Luxiu
Zhang, Bo
PLOS ONE, 2024, 19 (04):
[36] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
Long, Yanhua
Li, Yijie
Wei, Shuang
Zhang, Qiaozheng
Yang, Chunxia
IEEE ACCESS, 2019, 7 : 133615 - 133627
[37] Perceptive Self-Supervised Learning Network for Noisy Image Watermark Removal
Tian, Chunwei
Zheng, Menghua
Li, Bo
Zhang, Yanning
Zhang, Shichao
Zhang, David
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7069 - 7079
[38] Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches
Sanada, Yutaro
Nakagawa, Takumi
Wada, Yuichiro
Takanashi, Kosaku
Zhang, Yuhui
Tokuyama, Kiichi
Kanamori, Takafumi
Yamada, Tomonori
INTERSPEECH 2022, 2022, : 1178 - 1182
[39] Self-Supervised Learning in the Twilight of Noisy Real-World Datasets
Tendle, Atharva
Little, Andrew
Scott, Stephen
Hasan, Mohammad Rashedul
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 461 - 464
[40] A Novel Self-Supervised Re-labeling Approach for Training with Noisy Labels
Mandal, Devraj
Bharadwaj, Shrisha
Biswas, Soma
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1370 - 1379

← 1 2 3 4 5 →