Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

被引:0
|
作者
Du, Yichao [1 ]
Zhang, Zhirui [2 ]
Wang, Weizhi [3 ]
Chen, Boxing [2 ]
Xie, Jun [2 ]
Xu, Tong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Alibaba DAMO Acad, Machine Intelligence Technol Lab, Hangzhou, Peoples R China
[3] Rutgers State Univ, New Brunswick, NJ USA
基金
中国国家自然科学基金;
关键词
MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end speech-to-text translation (E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus (speech,transcription,translation), the conventional high-quality E2E-ST system leverages the (speech, transcription) pair to pre-train the model and then utilizes the (speech, translation) pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by the pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs, while achieving better performance in the automatic speech recognition task.
引用
收藏
页码:10590 / 10598
页数:9
相关论文
共 50 条
  • [21] TIGHT INTEGRATED END-TO-END TRAINING FOR CASCADED SPEECH TRANSLATION
    Bahar, Parnia
    Bieschke, Tobias
    Schlueter, Ralf
    Ney, Hermann
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 950 - 957
  • [22] Towards a Deep Understanding of Multilingual End-to-End Speech Translation
    Sun, Haoran
    Zhao, Xiaohu
    Lei, Yikun
    Zhu, Shaolin
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14332 - 14348
  • [23] Knowledge Distillation on Joint Task End-to-End Speech Translation
    Nayem, Khandokar Md
    Xue, Ran
    Chang, Ching-Yun
    Shanbhogue, Akshaya Vishnu Kudlu
    INTERSPEECH 2023, 2023, : 1493 - 1497
  • [24] SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
    Tsiamas, Ioannis
    Gallego, Gerard I.
    Fonollosa, Jose A. R.
    Costa-jussa, Marta R.
    INTERSPEECH 2022, 2022, : 106 - 110
  • [25] PromptST: Abstract Prompt Learning for End-to-End Speech Translation
    Yu, Tengfei
    Ding, Liang
    Liu, Xuebo
    Chen, Kehai
    Zhang, Meishan
    Tao, Dacheng
    Zhang, Min
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10140 - 10154
  • [26] ONE-TO-MANY MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Di Gangi, Mattia A.
    Negri, Matteo
    Turchi, Marco
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 585 - 592
  • [27] Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation
    Salesky, Elizabeth
    Sperber, Matthias
    Black, Alan W.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1835 - 1841
  • [28] Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data
    Zhang, Yuhao
    Xu, Chen
    Hu, Bojie
    Zhang, Chunliang
    Xiao, Tong
    Zhu, Jingbo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13984 - 13992
  • [29] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    INTERSPEECH 2022, 2022, : 121 - 125
  • [30] AN EMPIRICAL STUDY OF END-TO-END SIMULTANEOUS SPEECH TRANSLATION DECODING STRATEGIES
    Ha Nguyen
    Esteve, Yannick
    Besacier, Laurent
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7528 - 7532