Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

被引:0
|
作者
Du, Yichao [1 ]
Zhang, Zhirui [2 ]
Wang, Weizhi [3 ]
Chen, Boxing [2 ]
Xie, Jun [2 ]
Xu, Tong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Alibaba DAMO Acad, Machine Intelligence Technol Lab, Hangzhou, Peoples R China
[3] Rutgers State Univ, New Brunswick, NJ USA
基金
中国国家自然科学基金;
关键词
MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end speech-to-text translation (E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus (speech,transcription,translation), the conventional high-quality E2E-ST system leverages the (speech, transcription) pair to pre-train the model and then utilizes the (speech, translation) pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by the pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs, while achieving better performance in the automatic speech recognition task.
引用
收藏
页码:10590 / 10598
页数:9
相关论文
共 50 条
  • [31] Self-Supervised Representations Improve End-to-End Speech Translation
    Wu, Anne
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 1491 - 1495
  • [32] End-to-end Speech Translation by Integrating Cross-modal Information
    Liu Y.-C.
    Zong C.-Q.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (04): : 1837 - 1849
  • [33] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
    Ma, Xutai
    Pino, Juan
    Koehn, Philipp
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
  • [34] END-TO-END SPEECH TRANSLATION WITH SELF-CONTAINED VOCABULARY MANIPULATION
    Tu, Mei
    Zhang, Fan
    Liu, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7929 - 7933
  • [35] Revisiting End-to-End Speech-to-Text Translation From Scratch
    Zhang, Biao
    Haddow, Barry
    Sennrich, Rico
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [36] MuST-C: A multilingual corpus for end-to-end speech translation
    Cattoni, Roldano
    Di Gangi, Mattia Antonino
    Bentivogli, Luisa
    Negri, Matteo
    Turchi, Marco
    COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [37] Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution
    Ko, Yuka
    Sudoh, Katsuhito
    Sakti, Sakriani
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (10) : 1322 - 1331
  • [38] Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation
    Han, Yuchen
    Xu, Chen
    Xiao, Tong
    Zhu, Jingbo
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1340 - 1348
  • [39] Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
    Nguyen, Ha
    Esteve, Yannick
    Besacier, Laurent
    INTERSPEECH 2021, 2021, : 2371 - 2375
  • [40] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    Watanabe, Shinji
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1872 - 1881