SimulSpeech: End-to-End Simultaneous Speech to Text Translation

被引:0
|
作者
Ren, Yi [1 ]
Liu, Jinglin [1 ]
Tan, Xu [2 ]
Zhang, Chen [1 ]
Qin, Tao [2 ]
Zhao, Zhou [1 ]
Liu, Tie-Yan [2 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Microsoft Res, Redmond, WA USA
基金
中国国家自然科学基金; 国家重点研发计划; 浙江省自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates speech in source language to text in target language concurrently. SimulSpeech consists of a speech encoder, a speech segmenter and a text decoder, where 1) the segmenter builds upon the encoder and leverages a connectionist temporal classification (CTC) loss to split the input streaming speech in real time, 2) the encoder-decoder attention adopts a wait-k strategy for simultaneous translation. SimulSpeech is more challenging than previous cascaded systems (with simultaneous automatic speech recognition (ASR) and simultaneous neural machine translation (NMT)). We introduce two novel knowledge distillation methods to ensure the performance: 1) Attention-level knowledge distillation transfers the knowledge from the multiplication of the attention matrices of simultaneous NMT and ASR models to help the training of the attention mechanism in SimulSpeech; 2) Data-level knowledge distillation transfers the knowledge from the full-sentence NMT model and also reduces the complexity of data distribution to help on the optimization of SimulSpeech. Experiments on MuST-C English-Spanish and English-German spoken language translation datasets show that SimulSpeech achieves reasonable BLEU scores and lower delay compared to full-sentence end-to-end speech to text translation (without simultaneous translation), and better performance than the two-stage cascaded simultaneous translation model in terms of BLEU scores and translation delay.
引用
收藏
页码:3787 / 3796
页数:10
相关论文
共 50 条
  • [31] Speech-and-Text Transformer: Exploiting Unpaired Text for End-to-End Speech Recognition
    Wang, Qinyi
    Zhou, Xinyuan
    Li, Haizhou
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
  • [32] Fluent Translations from Disfluent Speech in End-to-End Speech Translation
    Salesky, Elizabeth
    Sperber, Matthias
    Waibel, Alex
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2786 - 2792
  • [33] An Experimental Methodology for an End-to-End Evaluation in Speech-to-Speech Translation
    Hamon, Olivier
    Mostefa, Djamel
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3539 - 3546
  • [34] End-to-end evaluation in JANUS: A speech-to-speech translation system
    Gates, D
    Lavie, A
    Levin, L
    Waibel, A
    Gavalda, M
    Mayfield, L
    Woszczyna, M
    Zhan, PM
    DIALOGUE PROCESSING IN SPOKEN LANGUAGE SYSTEMS, 1997, 1236 : 195 - 206
  • [35] SimulSLT: End-to-End Simultaneous Sign Language Translation
    Yin, Aoxiong
    Zhao, Zhou
    Liu, Jinglin
    Jin, Weike
    Zhang, Meng
    Zeng, Xingshan
    He, Xiaofei
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4118 - 4127
  • [36] A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
    Ma, Zhengrui
    Fang, Qingkai
    Zhang, Shaolei
    Guo, Shoutao
    Feng, Yang
    Zhang, Min
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1557 - 1575
  • [37] ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020
    Elbayad, Maha
    Ha Nguyen
    Bougares, Fethi
    Tomashenko, Natalia
    Caubriere, Antoine
    Lecouteux, Benjamin
    Esteve, Yannick
    Besacier, Laurent
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 35 - 43
  • [38] RTNet: An End-to-End Method for Handwritten Text Image Translation
    Su, Tonghua
    Liu, Shuchen
    Zhou, Shengjie
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 99 - 113
  • [39] Multitask Training with Text Data for End-to-End Speech Recognition
    Wang, Peidong
    Sainath, Tara N.
    Weiss, Ron J.
    INTERSPEECH 2021, 2021, : 2566 - 2570
  • [40] SpecRec: An Alternative Solution for Improving End-to-End Speech-to-Text Translation via Spectrogram Reconstruction
    Chen, Junkun
    Ma, Mingbo
    Zheng, Renjie
    Huang, Liang
    INTERSPEECH 2021, 2021, : 2232 - 2236