Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

被引:0
|
作者
Du, Yichao [1 ]
Zhang, Zhirui [2 ]
Wang, Weizhi [3 ]
Chen, Boxing [2 ]
Xie, Jun [2 ]
Xu, Tong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Alibaba DAMO Acad, Machine Intelligence Technol Lab, Hangzhou, Peoples R China
[3] Rutgers State Univ, New Brunswick, NJ USA
基金
中国国家自然科学基金;
关键词
MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end speech-to-text translation (E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus (speech,transcription,translation), the conventional high-quality E2E-ST system leverages the (speech, transcription) pair to pre-train the model and then utilizes the (speech, translation) pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by the pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs, while achieving better performance in the automatic speech recognition task.
引用
收藏
页码:10590 / 10598
页数:9
相关论文
共 50 条
  • [1] MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Inaguma, Hirofumi
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
  • [2] End-to-End Speech Translation for Code Switched Speech
    Weller, Orion
    Sperber, Matthias
    Pires, Telmo
    Setiawan, Hendra
    Gollan, Christian
    Telaar, Dominic
    Paulik, Matthias
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
  • [3] Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matrices
    Zhao, Xiaohu
    Sun, Haoran
    Lei, Yikun
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [4] End-to-End Speech Translation with Adversarial Training
    Li, Xuancai
    Chen, Kehai
    Zhao, Tiejun
    Yang, Muyun
    WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
  • [5] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
    Berard, Alexandre
    Besacier, Laurent
    Kocabiyikoglu, Ali Can
    Pietquin, Olivier
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
  • [6] End-to-End Speech Translation with Knowledge Distillation
    Liu, Yuchen
    Xiong, Hao
    Zhang, Jiajun
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    Zong, Chengqing
    INTERSPEECH 2019, 2019, : 1128 - 1132
  • [7] Adaptive Feature Selection for End-to-End Speech Translation
    Zhang, Biao
    Titov, Ivan
    Haddow, Barry
    Sennrich, Rico
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2533 - 2544
  • [8] MINTZAI: End-to-end Deep Learning for Speech Translation
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Hernaez, Inma
    Navas, Eva
    Gonzalez-Docasal, Ander
    Osacar, Jaime
    Benites, Edson
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
  • [9] Speaker voice normalization for end-to-end speech translation
    Xue, Zhengshan
    Shi, Tingxun
    Zhang, Xiaolei
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [10] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhang, Chen
    Qin, Tao
    Zhao, Zhou
    Liu, Tie-Yan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796