CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

被引:0
作者
Jia, Ye [1 ]
Ramanovich, Michelle Tadmor [1 ]
Wang, Quan [1 ]
Zen, Heiga [1 ]
机构
[1] Google Res, Mountain View, CA USA
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
speech-to-speech translation; speech-to-text translation; multilingual; cross-lingual voice transferring;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice (Ardila et al., 2020) speech corpus and the CoVoST 2 (Wang et al., 2021b) speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems. Two versions of translation speech in English are provided: 1) CVSS-C: All the translation speech is in a single high-quality canonical voice; 2) CVSS-T: The translation speech is in voices transferred from the corresponding source speech. In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech. On each version of CVSS, we built baseline multilingual direct S2ST models and cascade S2ST models, verifying the effectiveness of the corpus. To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous state-of-the-art trained on the corpus without extra data by 5.8 BLEU. Nevertheless, the performance of the direct S2ST models approaches the strong cascade baselines when trained from scratch, and with only 0.1 or 0.7 BLEU difference on ASR transcribed translation when initialized from matching ST models.
引用
收藏
页码:6691 / 6703
页数:13
相关论文
共 57 条
[1]  
[Anonymous], 2003, 8 EUR C SPEECH COMM
[2]  
Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218
[3]  
Babu A., 2021, ARXIV211109296
[4]  
Bansal S., 2019, P C N AM CHAPT ASS C
[5]  
Bendazzoli C., 2005, P M CURIE EUROCONFER
[6]  
Boito M. Z., 2020, P LANG RES EV C LREC
[7]  
Brandschain L., 2008, P LANG RES EV C LREC
[8]  
Cieri C., 2004, P LANG RES EV C LREC
[9]  
Cieri C., 2007, P INT
[10]  
Devlin J., 2018, P C N AM CHAPT ASS C, P1