Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition

被引:2
作者
Vanderreydt, Geoffroy [1 ]
Remy, Francois [1 ]
Demuynck, Kris [1 ]
机构
[1] Ugent Imec, IDLab, Ghent, Belgium
来源
INTERSPEECH 2022 | 2022年
关键词
Speech Recognition; ASR; CTC; XLS-R;
D O I
10.21437/Interspeech.2022-10744
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained wav2vec2.0 model fine-tuned on a multi-lingual speech translation task. We show that the weights of this model form an excellent initialization for Connectionist Temporal Classification (CTC) speech recognition, a different but closely related task. We explore the benefits of this initialization for various languages, both in-domain and out-of-domain for the speech translation task. Our experiments on the CommonVoice dataset confirm that our approach performs significantly better in-domain, and is often better out-of-domain too. This method is particularly relevant for Automatic Speech Recognition (ASR) with limited data and/or compute budget during training.
引用
收藏
页码:3053 / 3057
页数:5
相关论文
共 15 条
[1]  
Ardila Rosana, 2019, ARXIV191206670
[2]  
Babu A., 2021, ARXIV211109296
[3]  
Baevski A., 2020, Proc. Advances in neural information processing systems, V33, P12449
[4]  
Bansal S., 2017, SPEECH TO TEXT TRANS
[5]  
Bansal S., 2018, ARXIV180901431
[6]  
Conneau A., 2020, INTERSPEECH
[7]  
Deng L, 2013, INT CONF ACOUST SPEE, P8604, DOI 10.1109/ICASSP.2013.6639345
[8]  
Graves A., 2006, 23 ICML, P369, DOI DOI 10.1145/1143844.1143891
[9]  
Li Xian, 2020, ARXIV201012829
[10]   Multilingual Denoising Pre-training for Neural Machine Translation [J].
Liu, Yinhan ;
Gu, Jiatao ;
Goyal, Naman ;
Li, Xian ;
Edunov, Sergey ;
Ghazvininejad, Marjan ;
Lewis, Mike ;
Zettlemoyer, Luke .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 :726-742