Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution

被引:0
|
作者
Ko, Yuka [1 ]
Sudoh, Katsuhito [1 ]
Sakti, Sakriani [1 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol NAIST, Ikoma 6300192, Japan
[2] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Peoples R China
关键词
end-to-end speech translation; spoken language translation; multi-task learning; knowledge distillation; ARCHITECTURE;
D O I
10.1587/transinf.2023EDP7249
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
End-to-end speech translation (ST) directly renders source language speech to the target language without intermediate automatic speech recognition (ASR) output as in a cascade approach. End-to-end ST avoids error propagation from intermediate ASR results. Although recent attempts have applied multi-task learning using an auxiliary task of ASR to improve ST performance, they use cross-entropy loss to one-hot references in the ASR task, and the trained ST models do not consider possible ASR confusion. In this study, we propose a novel multi-task learning framework for end-to-end STs leveraged by ASR-based loss against posterior distributions obtained using a pre-trained ASR model called ASR posterior-based loss (ASR-PBL). The ASR-PBL method, which enables a ST model to reflect possible ASR confusion among competing hypotheses with similar pronunciations, can be applied to one of the strong multi-task ST baseline models with Hybrid CTC/Attention ASR task loss. In our experiments on the Fisher Spanish-to-English corpus, the proposed method demonstrated better BLEU results than the baseline that used standard CE loss.
引用
收藏
页码:1322 / 1331
页数:10
相关论文
共 50 条
  • [21] End-to-End Speech-to-Text Translation: A Survey
    Sethiya, Nivedita
    Maurya, Chandresh Kumar
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [22] Self-Training for End-to-End Speech Translation
    Pino, Juan
    Xu, Qiantong
    Ma, Xutai
    Dousti, Mohammad Javad
    Tang, Yun
    INTERSPEECH 2020, 2020, : 1476 - 1480
  • [23] Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University
    Bahar, Parnia
    Wilken, Patrick
    Alkhouli, Tamer
    Guta, Andreas
    Golik, Pavel
    Matusov, Evgeny
    Herold, Christian
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 44 - 54
  • [24] Fluent Translations from Disfluent Speech in End-to-End Speech Translation
    Salesky, Elizabeth
    Sperber, Matthias
    Waibel, Alex
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2786 - 2792
  • [25] An Experimental Methodology for an End-to-End Evaluation in Speech-to-Speech Translation
    Hamon, Olivier
    Mostefa, Djamel
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3539 - 3546
  • [26] End-to-end evaluation in JANUS: A speech-to-speech translation system
    Gates, D
    Lavie, A
    Levin, L
    Waibel, A
    Gavalda, M
    Mayfield, L
    Woszczyna, M
    Zhan, PM
    DIALOGUE PROCESSING IN SPOKEN LANGUAGE SYSTEMS, 1997, 1236 : 195 - 206
  • [27] Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution
    Zhang, Zi-qiang
    Song, Yan
    Zhang, Jian-shu
    McLoughlin, Ian
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 3580 - 3584
  • [28] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083
  • [29] Curriculum Pre-training for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Zhou, Ming
    Yang, Zhenglu
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3728 - 3738
  • [30] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
    Battenberg, Eric
    Chen, Jitong
    Child, Rewon
    Coates, Adam
    Gaur, Yashesh
    Li, Yi
    Liu, Hairong
    Satheesh, Sanjeev
    Sriram, Anuroop
    Zhu, Zhenyao
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213