COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH

被引：1

作者：

Poncelet, Jakob ^{[1
]}

Hamme, Hugo Van ^{[1
]}

机构：

[1] Katholieke Univ Leuven, Dept Elect Engn ESAT PSI, Kasteelpk Arenberg 10,Bus 2441, B-3001 Leuven, Belgium

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

speech recognition; self-supervised learning; pre-training; cross-lingual;

D O I：

10.1109/ASRU51503.2021.9688061

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent research in speech processing exhibits a growing interest in unsupervised and self-supervised representation learning from unlabelled data to alleviate the need for large amounts of annotated data. We investigate several popular pre-training methods and apply them to Flemish Dutch. We compare off-the-shelf English pre-trained models to models trained on an increasing amount of Flemish data. We find that the most important factors for positive transfer to downstream speech recognition tasks include a substantial amount of data and a matching pre-training domain. Ideally, we also finetune on an annotated subset in the target language. All pre-trained models improve linear phone separability in Flemish, but not all methods improve Automatic Speech Recognition. We experience superior performance with wav2vec 2.0 and we obtain a 30% WER improvement by finetuning the multilingually pre-trained XLSR-53 model on Flemish Dutch, after integration into an HMM-DNN acoustic model.

引用

页码：169 / 176

页数：8

共 43 条

[1]

Ardila R., 2020, Common voice: A massively-multilingual speech corpus

[2]

Baevski A., 2021, Advances in Neural Information Processing Systems

[3]

Baevski A, 2020, vq-wav2vec: Self-supervised learning of discrete speech representations

[4]

Baevski A, 2020, ADV NEUR IN, V33

[5]

Baevski Alexei., 2020, Effectiveness of self-supervised pre-training for speech recognition

[6]

ChanghanWang Morgane Riviere, VOXPOPULI LARGE SCAL, V2021

[7]

Chi Po-Han, AUDIO ALBERT LITE BE, P2021

[8] Unsupervised Speech Representation Learning Using WaveNet Autoencoders [J].

Chorowski, Jan ;

Weiss, Ron J. ;

Bengio, Samy ;

van den Oord, Aaron .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) :2041-2053

[9] SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS [J].

Chung, Yu-An ;

Belinkov, Yonatan ;

Glass, James .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :3040-3044

[10]

Chung Yu-An, 2019, An unsupervised autoregressive model for speech represen- tation learning

← 1 2 3 4 5 →