COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION

被引：1

作者：

Li, Chak-Fai ^{[1
]}

Keith, Francis ^{[1
]}

Hartmann, William ^{[1
]}

Snover, Matthew ^{[1
]}

机构：

[1] Raytheon BBN Technol, Cambridge, MA 02138 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

seq2seq; unsupervised learning; semi-supervised training; domain adaptation; REPRESENTATION;

D O I：

10.1109/ICASSP43922.2022.9747005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent advances in unsupervised representation learning have demonstrated the impact of pretraining on large amounts of read speech. We adapt these techniques for domain adaptation in low-resource-both in terms of data and compute-conversational and broadcast domains. Moving beyond CTC, we pretrain state-of-the-art Conformer models in an unsupervised manner. While the unsupervised approach outperforms traditional semi-supervised training, the techniques are complementary. Combining the techniques is a 5% absolute improvement in WER, averaged over all conditions, compared to semi-supervised training alone. Additional text data is incorporated through external language models. By using CTC-based decoding, we are better able to take advantage of the additional text data. When used as a transcription model, it allows the Conformer model to better incorporate the knowledge from the language model through semi-supervised training than shallow fusion. Final performance is an additional 2% better absolute when using CTC-based decoding for semi-supervised training compared to shallow fusion.

引用

页码：6892 / 6896

页数：5

共 44 条

[31] Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models
Lam-Yee-Mui, Lea-Marie
Yang, Lucas Ondel
Klejch, Ondrej
INTERSPEECH 2023, 2023, : 87 - 91
[32] Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration
Aminzadeh, A. Ryan
Shen, Wade
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 265 - 268
[33] IDMatchHAR: Semi-Supervised Learning for Sensor-Based Human Activity Recognition Using Pretraining
Takenaka, Koki
Sakai, Shunsuke
Hasegawa, Tatsuhito
IEEE SENSORS LETTERS, 2025, 9 (04)
[34] Fuzzy weighted sparse reconstruction error-steered semi-supervised learning for face recognition
Liu, Li
Chen, Siqi
Chen, Xiuxiu
Wang, Tianshi
Zhang, Long
VISUAL COMPUTER, 2020, 36 (08) : 1521 - 1534
[35] Cooperative supervised and unsupervised learning algorithm for phoneme recognition in continuous speech and speaker-independent context
Arous, N
Ellouze, N
NEUROCOMPUTING, 2003, 51 : 225 - 235
[36] Semi-Supervised FMCW Radar Hand Gesture Recognition via Pseudo-Label Consistency Learning
Shi, Yuhang
Qiao, Lihong
Shu, Yucheng
Li, Baobin
Xiao, Bin
Li, Weisheng
Gao, Xinbo
REMOTE SENSING, 2024, 16 (13)
[37] Retrieval-based cartoon gesture recognition and applications via semi-supervised heterogeneous classifiers learning
Liang, Zhang
Zhuang, Yueting
Yang, Yi
Xiao, Jun
PATTERN RECOGNITION, 2013, 46 (01) : 412 - 423
[38] Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages
Zhang, Haitong
Lin, Yue
INTERSPEECH 2020, 2020, : 3161 - 3165
[39] IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES
Li, Chia-Yu
Vu, Ngoc Thang
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 822 - 829
[40] Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
Su, Rongfeng
Liu, Xunying
Wang, Lan
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3509 - 3513

← 1 2 3 4 5 →