Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

被引:0
|
作者
Du, Yichao [1 ]
Zhang, Zhirui [2 ]
Wang, Weizhi [3 ]
Chen, Boxing [2 ]
Xie, Jun [2 ]
Xu, Tong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Alibaba DAMO Acad, Machine Intelligence Technol Lab, Hangzhou, Peoples R China
[3] Rutgers State Univ, New Brunswick, NJ USA
基金
中国国家自然科学基金;
关键词
MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end speech-to-text translation (E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus (speech,transcription,translation), the conventional high-quality E2E-ST system leverages the (speech, transcription) pair to pre-train the model and then utilizes the (speech, translation) pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by the pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs, while achieving better performance in the automatic speech recognition task.
引用
收藏
页码:10590 / 10598
页数:9
相关论文
共 50 条
  • [41] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [42] CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
    Lei, Yikun
    Xue, Zhengshan
    Sun, Haoran
    Zhao, Xiaohu
    Zhu, Shaolin
    Lin, Xiaodong
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3123 - 3137
  • [43] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [44] Forward-Backward Decoding for Regularizing End-to-End TTS
    Zheng, Yibin
    Wang, Xi
    He, Lei
    Pan, Shifeng
    Soong, Frank K.
    Wen, Zhengqi
    Tao, Jianhua
    INTERSPEECH 2019, 2019, : 1283 - 1287
  • [45] CCSRD: Content-Centric Speech Representation Disentanglement Learning for End-to-End Speech Translation
    Zhao, Xiaohu
    Sun, Haoran
    Lei, Yikun
    Zhu, Shaolin
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5920 - 5932
  • [46] ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
    Le, Chenyang
    Qian, Yao
    Zhou, Long
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    Huang, Xuedong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
    Xu, Chen
    Liu, Xiaoqian
    Liu, Xiaowen
    Wang, Laohu
    Huang, Canan
    Xiao, Tong
    Zhu, Jingbo
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 92 - 99
  • [48] Transformer-Based End-to-End Speech Translation With Rotary Position Embedding
    Li, Xueqing
    Li, Shengqiang
    Zhang, Xiao-Lei
    Rahardja, Susanto
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 371 - 375
  • [49] Optimally Encoding Inductive Biases into the Transformer Improves End-to-End Speech Translation
    Vyas, Piyush
    Kuznetsova, Anastasia
    Williamson, Donald S.
    INTERSPEECH 2021, 2021, : 2287 - 2291
  • [50] Beyond Sentence-Level End-to-End Speech Translation: Context Helps
    Zhang, Biao
    Titov, Ivan
    Haddow, Barry
    Sennrich, Rico
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2566 - 2578