Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

被引：0

作者：

Zhang, Bowen ^{[1
]}

Cao, Songjun ^{[2
]}

Zhang, Xiaoming ^{[2
]}

Zhang, Yike ^{[2
]}

Ma, Long ^{[2
]}

Shinozaki, Takahiro ^{[1
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Tencent Cloud Xiaowei, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

semi-supervised learning; speech recognition; self-training; pseudo-labeling; curriculum learning;

D O I：

10.21437/Interspeech.2022-10226

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also incorporate a temporal pseudo label pool and an exponential moving average to control the pseudo labels' update frequency and to avoid model divergence. Experimental results on Libri-Light and LibriSpeech datasets manifest our proposed method achieves better performance compared to existing approaches while being more unified.

引用

页码：2653 / 2657

页数：5

共 50 条

[1] Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models
Lam-Yee-Mui, Lea-Marie
Yang, Lucas Ondel
Klejch, Ondrej
INTERSPEECH 2023, 2023, : 87 - 91
[2] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
Zhuang, Yingying
Song, Jiecheng
Sadagopan, Narayanan
Beniwal, Anurag
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
[3] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
Khare, Aparna
Wu, Minhua
Bhati, Saurabhchand
Droppo, Jasha
Maas, Roland
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
[4] A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection
Qiao, Yuhan
Cui, Chaoqun
Wang, Yiying
Jia, Caiyan
NEUROCOMPUTING, 2024, 604
[5] AN ADAPTER BASED PRE-TRAINING FOR EFFICIENT AND SCALABLE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING
Kessler, Samuel
Thomas, Bethan
Karout, Salah
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3179 - 3183
[6] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
Zhu, Qiu-Shi
Zhang, Jie
Zhang, Zi-Qiang
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178
[7] Reducing Domain mismatch in Self-supervised speech pre-training
Baskar, Murali Karthick
Rosenberg, Andrew
Ramabhadran, Bhuvana
Zhang, Yu
INTERSPEECH 2022, 2022, : 3028 - 3032
[8] Self-supervised ECG pre-training
Liu, Han
Zhao, Zhenbo
She, Qiang
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
[9] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
Zhang, Zhenyu
Guo, Tao
Chen, Meng
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
[10] SSGait: enhancing gait recognition via semi-supervised self-supervised learning
Xi, Hao
Ren, Kai
Lu, Peng
Li, Yongqiang
Hu, Chuanping
APPLIED INTELLIGENCE, 2024, 54 (07) : 5639 - 5657

← 1 2 3 4 5 →