COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH

被引：1

作者：

Poncelet, Jakob ^{[1
]}

Hamme, Hugo Van ^{[1
]}

机构：

[1] Katholieke Univ Leuven, Dept Elect Engn ESAT PSI, Kasteelpk Arenberg 10,Bus 2441, B-3001 Leuven, Belgium

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

speech recognition; self-supervised learning; pre-training; cross-lingual;

D O I：

10.1109/ASRU51503.2021.9688061

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent research in speech processing exhibits a growing interest in unsupervised and self-supervised representation learning from unlabelled data to alleviate the need for large amounts of annotated data. We investigate several popular pre-training methods and apply them to Flemish Dutch. We compare off-the-shelf English pre-trained models to models trained on an increasing amount of Flemish data. We find that the most important factors for positive transfer to downstream speech recognition tasks include a substantial amount of data and a matching pre-training domain. Ideally, we also finetune on an annotated subset in the target language. All pre-trained models improve linear phone separability in Flemish, but not all methods improve Automatic Speech Recognition. We experience superior performance with wav2vec 2.0 and we obtain a 30% WER improvement by finetuning the multilingually pre-trained XLSR-53 model on Flemish Dutch, after integration into an HMM-DNN acoustic model.

引用

页码：169 / 176

页数：8

共 50 条

[21] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar, A.
Umesh, S.
INTERSPEECH 2022, 2022, : 3418 - 3422
[22] ENHANCING THE DOMAIN ROBUSTNESS OF SELF-SUPERVISED PRE-TRAINING WITH SYNTHETIC IMAGES
Hassan, Mohamad N. C.
Bhattacharya, Avigyan
da Costa, Victor G. Turrisi
Banerjee, Biplab
Ricci, Elisa
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5470 - 5474
[23] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
Islam, Tanvir
Washington, Peter
APPLIED SCIENCES-BASEL, 2023, 13 (21):
[24] Progressive self-supervised learning: A pre-training method for crowd counting
Gu, Yao
Zheng, Zhe
Wu, Yingna
Xie, Guangping
Ni, Na
PATTERN RECOGNITION LETTERS, 2025, 188 : 148 - 154
[25] Class incremental learning with self-supervised pre-training and prototype learning
Liu, Wenzhuo
Wu, Xin-Jian
Zhu, Fei
Yu, Ming-Ming
Wang, Chuang
Liu, Cheng-Lin
PATTERN RECOGNITION, 2025, 157
[26] LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Qu, Leyuan
Weber, Cornelius
Wermter, Stefan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2772 - 2782
[27] DenseCL: A simple framework for self-supervised dense visual pre-training
Wang, Xinlong
Zhang, Rufeng
Shen, Chunhua
Kong, Tao
VISUAL INFORMATICS, 2023, 7 (01) : 30 - 40
[28] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
Shim, Hye-jin
Heo, Hee-Soo
Jung, Jee-weon
Yu, Ha-Jin
INTERSPEECH 2020, 2020, : 1091 - 1095
[29] MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION
Wu, Ho-Hsiang
Kao, Chieh-Chi
Tang, Qingming
Sun, Ming
McFee, Brian
Bello, Juan Pablo
Wang, Chao
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 556 - 560
[30] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
Zhu, Qiu-Shi
Zhang, Jie
Zhang, Zi-Qiang
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178

← 1 2 3 4 5 →