From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation

被引:1
|
作者
Liu, Danni [1 ]
Wang, Changhan [2 ]
Gong, Hongyu [2 ]
Ma, Xutai [2 ,3 ]
Tang, Yun [2 ]
Pino, Juan [2 ]
机构
[1] Maastricht Univ, Maastricht, Netherlands
[2] Meta AI, Menlo Pk, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
来源
INTERSPEECH 2022 | 2022年
关键词
speech translation; text-to-speech; low-latency;
D O I
10.21437/Interspeech.2022-10568
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge of delivering S2ST in real time is the accumulated delay between the translation and speech synthesis modules. While recently incremental text-to-speech (iTTS) models have shown large quality improvements, they typically require additional future text inputs to reach optimal performance. In this work, we minimize the initial waiting time of iTTS by adapting the upstream speech translator to generate high-quality pseudo lookahead for the speech synthesizer. After mitigating the initial delay, we demonstrate that the duration of synthesized speech also plays a crucial role on latency. We formalize this as a latency metric and then present a simple yet effective duration-scaling approach for latency reduction. Our approaches consistently reduce latency by 0.2-0.5 second without sacrificing speech translation quality.(1)
引用
收藏
页码:1771 / 1775
页数:5
相关论文
共 45 条
  • [1] SIMULTANEOUS SPEECH-TO-SPEECH TRANSLATION SYSTEM WITH TRANSFORMER-BASED INCREMENTAL ASR, MT, AND TTS
    Fukuda, Ryo
    Novitasari, Sashi
    Oka, Yui
    Kano, Yasumasa
    Yano, Yuki
    Ko, Yuka
    Tokuyama, Hirotaka
    Doi, Kosuke
    Yanagita, Tomoya
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 186 - 192
  • [2] ASSESSING EVALUATION METRICS FOR SPEECH-TO-SPEECH TRANSLATION
    Salesky, Elizabeth
    Maeder, Julian
    Klinger, Severin
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 733 - 740
  • [3] Stress Transfer in Speech-to-Speech Machine Translation
    Akarsh, Sai
    Narasinga, Vamshiraghusimha
    Vuppala, Anil Kumar
    INTERSPEECH 2024, 2024, : 995 - 996
  • [4] INTENT TRANSFER IN SPEECH-TO-SPEECH MACHINE TRANSLATION
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 153 - 158
  • [5] SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 593 - 600
  • [6] Rhonda: the architecture of a multilingual speech-to-speech translation pipeline
    Louw, Johannes A.
    Moodley, Avashlin
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND INNOVATIVE COMPUTING APPLICATIONS (ICONIC), 2018, : 194 - 200
  • [7] AUTOMATIC PRONUNCIATION PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS OF DIALECTAL ARABIC IN A SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Ananthakrishnan, Sankaranarayanan
    Tsakalidis, Stavros
    Prasad, Rohit
    Natarajan, Prem
    Vembu, Aravind Namandi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4957 - 4960
  • [8] Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation
    Akarsh, Sai C.
    Narasinga, Vamshiraghusimha
    Mondal, Anindita
    Vuppala, Anil
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [9] Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
    Dong, Qianqian
    Yue, Fengpeng
    Ko, Tom
    Wang, Mingxuan
    Bai, Qibing
    Zhang, Yu
    INTERSPEECH 2022, 2022, : 1781 - 1785
  • [10] AwezaMed: A Multilingual, Multimodal Speech-To-Speech Translation Application for Maternal Health Care
    Marais, Laurette
    Louw, Johannes A.
    Badenhorst, Jaco
    Calteaux, Karen
    Wilken, Ilana
    van Niekerk, Nina
    Stein, Glenn
    PROCEEDINGS OF 2020 23RD INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2020), 2020, : 669 - 676