Sequence-to-Sequence Models for Emphasis Speech Translation

被引：11

作者：

Quoc Truong Do ^{[1
]}

Sakti, Sakriani ^{[1
,2
]}

Nakamura, Satoshi ^{[1
,2
]}

机构：

[1] Nara Inst Sci & Technol, Ikoma 6300192, Japan

[2] RIKEN Ctr Adv Intelligence Project AIP, Ikoma 6300192, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 10期

关键词：

Emphasis estimation; emphasis translation; speech-to-speech translation (S2ST); joint optimization of words and emphasis; GENERATION;

D O I：

10.1109/TASLP.2018.2846402

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech-to-speech translation (S2ST) systems are capable of breaking language barriers in cross-lingual communication by translating speech across languages. Recent studies have introduced many improvements that allow existing S2ST systems to handle not only linguistic meaning hut also paralinguistic information such as emphasis by proposing additional emphasis estimation and translation components. However, the approach used for emphasis translation is not optimal for sequence translation tasks and fails to easily handle the long-term dependencies of words and emphasis levels. It also requires the quantization of emphasis levels and treats them asdiscrete labels instead of continuous values. Moreover, the whole translation pipeline is fairly complexand slow because all components are trained separately without joint optimization. In this paper, we make two contributions: 1) we propose an approach that can handle continuous emphasis levels based on sequence-to-sequence models, and 2) we combine machine and emphasis translation into a single model, which greatly simplifies the translation pipeline and make it easier to perform joint optimization. Our results on an emphasis translation task indicate that our translation models outperform previous models by a large margin in both objective and subjective tests. Experiments on a joint translation model also show that our models can perform joint translation of words and emphasis with one-word delays instead of full-sentence delays while preserving the translation performance of both tasks.

引用

页码：1873 / 1883

页数：11

共 50 条

[1] Direct speech-to-speech translation with a sequence-to-sequence model
Jia, Ye
Weiss, Ron J.
Biadsy, Fadi
Macherey, Wolfgang
Johnson, Melvin
Chen, Zhifeng
Wu, Yonghui
INTERSPEECH 2019, 2019, : 1123 - 1127
[2] Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis
Quoc Truong Do
Sakti, Sakriani
Nakamura, Satoshi
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2640 - 2644
[3] A Comparison of Sequence-to-Sequence Models for Speech Recognition
Prabhavalkar, Rohit
Rao, Kanishka
Sainath, Tara N.
Li, Bo
Johnson, Leif
Jaitly, Navdeep
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 939 - 943
[4] SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Yang, Gene-Ping
Tang, Hao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7222 - 7226
[5] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Weiss, Ron J.
Chorowski, Jan
Jaitly, Navdeep
Wu, Yonghui
Chen, Zhifeng
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629
[6] STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
Chiu, Chung-Cheng
Sainath, Tara N.
Wu, Yonghui
Prabhavalkar, Rohit
Nguyen, Patrick
Chen, Zhifeng
Kannan, Anjuli
Weiss, Ron J.
Rao, Kanishka
Gonina, Ekaterina
Jaitly, Navdeep
Li, Bo
Chorowski, Jan
Bacchiani, Michiel
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4774 - 4778
[7] COUPLED TRAINING OF SEQUENCE-TO-SEQUENCE MODELS FOR ACCENTED SPEECH RECOGNITION
Unni, Vinit
Joshi, Nitish
Jyothi, Preethi
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8254 - 8258
[8] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
Ueno, Sei
Mimura, Masato
Sakai, Shinsuke
Kawahara, Tatsuya
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 333 - 343
[9] Sparse Sequence-to-Sequence Models
Peters, Ben
Niculae, Vlad
Martins, Andre F. T.
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1504 - 1519
[10] ON USING 2D SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Bahar, Parnia
Zeyer, Albert
Schlueter, Ralf
Ney, Hermann
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5671 - 5675

← 1 2 3 4 5 →