Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

被引：3

作者：

Dong, Qianqian ^{[1
]}

Yue, Fengpeng ^{[1
,2
]}

Ko, Tom ^{[1
]}

Wang, Mingxuan ^{[1
]}

Bai, Qibing ^{[1
,2
]}

Zhang, Yu ^{[2
,3
]}

机构：

[1] ByteDance AI Lab, Beijing, Peoples R China

[2] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech translation; speech-to-speech translation; pseudo-labeling;

D O I：

10.21437/Interspeech.2022-10011

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

引用

页码：1781 / 1785

页数：5

共 29 条

[1] Bacvski Alexei, 2020, Advances in neural information processing systems, V33, P12449, DOI DOI 10.48550/ARXIV.2006.11477
[2] Bu H, 2017, 2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), P58, DOI 10.1109/ICSDA.2017.8384449
[3] Chen G., 2021, P INT 2021
[4] Duquenne PA, 2021, ADV NEUR IN, V34
[5] Conformer: Convolution-augmented Transformer for Speech Recognition
Gulati, Anmol
Qin, James
Chiu, Chung-Cheng
Parmar, Niki
Zhang, Yu
Yu, Jiahui
Han, Wei
Wang, Shibo
Zhang, Zhengdong
Wu, Yonghui
Pang, Ruoming
[J]. INTERSPEECH 2020, 2020, : 5036 - 5040
[6] Inaguma H, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, P302
[7] Jia Y., 2021, ARXIV210708661
[8] Direct speech-to-speech translation with a sequence-to-sequence model
Jia, Ye
Weiss, Ron J.
Biadsy, Fadi
Macherey, Wolfgang
Johnson, Melvin
Chen, Zhifeng
Wu, Yonghui
[J]. INTERSPEECH 2019, 2019, : 1123 - 1127
[9] Jia Y, 2019, INT CONF ACOUST SPEE, P7180, DOI 10.1109/ICASSP.2019.8683343
[10] Kahn J, 2020, INT CONF ACOUST SPEE, P7669, DOI [10.1109/icassp40776.2020.9052942, 10.1109/ICASSP40776.2020.9052942]

← 1 2 3 →