COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH

被引:1
作者
Poncelet, Jakob [1 ]
Hamme, Hugo Van [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT PSI, Kasteelpk Arenberg 10,Bus 2441, B-3001 Leuven, Belgium
来源
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年
关键词
speech recognition; self-supervised learning; pre-training; cross-lingual;
D O I
10.1109/ASRU51503.2021.9688061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research in speech processing exhibits a growing interest in unsupervised and self-supervised representation learning from unlabelled data to alleviate the need for large amounts of annotated data. We investigate several popular pre-training methods and apply them to Flemish Dutch. We compare off-the-shelf English pre-trained models to models trained on an increasing amount of Flemish data. We find that the most important factors for positive transfer to downstream speech recognition tasks include a substantial amount of data and a matching pre-training domain. Ideally, we also finetune on an annotated subset in the target language. All pre-trained models improve linear phone separability in Flemish, but not all methods improve Automatic Speech Recognition. We experience superior performance with wav2vec 2.0 and we obtain a 30% WER improvement by finetuning the multilingually pre-trained XLSR-53 model on Flemish Dutch, after integration into an HMM-DNN acoustic model.
引用
收藏
页码:169 / 176
页数:8
相关论文
共 43 条
[1]  
Ardila R., 2020, Common voice: A massively-multilingual speech corpus
[2]  
Baevski A., 2021, Advances in Neural Information Processing Systems
[3]  
Baevski A, 2020, vq-wav2vec: Self-supervised learning of discrete speech representations
[4]  
Baevski A, 2020, ADV NEUR IN, V33
[5]  
Baevski Alexei., 2020, Effectiveness of self-supervised pre-training for speech recognition
[6]  
ChanghanWang Morgane Riviere, VOXPOPULI LARGE SCAL, V2021
[7]  
Chi Po-Han, AUDIO ALBERT LITE BE, P2021
[8]   Unsupervised Speech Representation Learning Using WaveNet Autoencoders [J].
Chorowski, Jan ;
Weiss, Ron J. ;
Bengio, Samy ;
van den Oord, Aaron .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) :2041-2053
[9]   SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS [J].
Chung, Yu-An ;
Belinkov, Yonatan ;
Glass, James .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :3040-3044
[10]  
Chung Yu-An, 2019, An unsupervised autoregressive model for speech represen- tation learning