COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH

被引:1
|
作者
Poncelet, Jakob [1 ]
Hamme, Hugo Van [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT PSI, Kasteelpk Arenberg 10,Bus 2441, B-3001 Leuven, Belgium
来源
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年
关键词
speech recognition; self-supervised learning; pre-training; cross-lingual;
D O I
10.1109/ASRU51503.2021.9688061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research in speech processing exhibits a growing interest in unsupervised and self-supervised representation learning from unlabelled data to alleviate the need for large amounts of annotated data. We investigate several popular pre-training methods and apply them to Flemish Dutch. We compare off-the-shelf English pre-trained models to models trained on an increasing amount of Flemish data. We find that the most important factors for positive transfer to downstream speech recognition tasks include a substantial amount of data and a matching pre-training domain. Ideally, we also finetune on an annotated subset in the target language. All pre-trained models improve linear phone separability in Flemish, but not all methods improve Automatic Speech Recognition. We experience superior performance with wav2vec 2.0 and we obtain a 30% WER improvement by finetuning the multilingually pre-trained XLSR-53 model on Flemish Dutch, after integration into an HMM-DNN acoustic model.
引用
收藏
页码:169 / 176
页数:8
相关论文
共 50 条
  • [21] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
    Arunkumar, A.
    Umesh, S.
    INTERSPEECH 2022, 2022, : 3418 - 3422
  • [22] ENHANCING THE DOMAIN ROBUSTNESS OF SELF-SUPERVISED PRE-TRAINING WITH SYNTHETIC IMAGES
    Hassan, Mohamad N. C.
    Bhattacharya, Avigyan
    da Costa, Victor G. Turrisi
    Banerjee, Biplab
    Ricci, Elisa
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5470 - 5474
  • [23] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [24] Progressive self-supervised learning: A pre-training method for crowd counting
    Gu, Yao
    Zheng, Zhe
    Wu, Yingna
    Xie, Guangping
    Ni, Na
    PATTERN RECOGNITION LETTERS, 2025, 188 : 148 - 154
  • [25] Class incremental learning with self-supervised pre-training and prototype learning
    Liu, Wenzhuo
    Wu, Xin-Jian
    Zhu, Fei
    Yu, Ming-Ming
    Wang, Chuang
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2025, 157
  • [26] LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
    Qu, Leyuan
    Weber, Cornelius
    Wermter, Stefan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2772 - 2782
  • [27] DenseCL: A simple framework for self-supervised dense visual pre-training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    VISUAL INFORMATICS, 2023, 7 (01) : 30 - 40
  • [28] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
    Shim, Hye-jin
    Heo, Hee-Soo
    Jung, Jee-weon
    Yu, Ha-Jin
    INTERSPEECH 2020, 2020, : 1091 - 1095
  • [29] MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION
    Wu, Ho-Hsiang
    Kao, Chieh-Chi
    Tang, Qingming
    Sun, Ming
    McFee, Brian
    Bello, Juan Pablo
    Wang, Chao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 556 - 560
  • [30] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178