Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

被引：20

作者：

Xu, Qiantong ^{[1
,2
]}

Baevski, Alexei ^{[1
]}

Auli, Michael ^{[1
]}

机构：

[1] Meta AI, New York, NY 10003 USA

[2] Sambanova Syst, Palo Alto, CA 94303 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

zero-shot transfer learning; cross-lingual; phoneme recognition; multilingual ASR;

D O I：

10.21437/Interspeech.2022-60

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data. However, in many cases there is labeled data available for related languages which is not utilized by these methods. This paper extends previous work on zero-shot cross-lingual transfer learning by fine-tuning a multilingually pretrained wav2vec 2.0 model to transcribe unseen languages. This is done by mapping phonemes of the training languages to the target language using articulatory features. Experiments show that this simple method significantly outperforms prior work which introduced task-specific architectures and used only part of a monolingually pretrained model.

引用

页码：2113 / 2117

页数：5

共 39 条

[1]

[Anonymous], 2021, P ICASSP

[2]

[Anonymous], 2006, P 23 INT C MACH LEAR, DOI 10.1145/1143844.1143891

[3]

Ardila Rosana, 2019, ARXIV191206670

[4]

Baevski A., 2021, Neural Information Processing System, P15

[5]

Baevski A., 2020, wav2vec 2.0: A framework for self-supervised learning of speech representations

[6]

Baevski A., 2020, P INT C LEARN REPR

[7]

Chen Kuan-Yu, 2019, P INT

[8]

Chung Y.-A., 2018, P INT

[9] An Unsupervised Autoregressive Model for Speech Representation Learning [J].

Chung, Yu-An ;

Hsu, Wei-Ning ;

Tang, Hao ;

Glass, James .

INTERSPEECH 2019, 2019, :146-150

[10]

Conneau A., 2020, INTERSPEECH

← 1 2 3 4 →