ASR Posterior-based Loss for Multi-task End-to-end Speech Translation

被引:3
|
作者
Ko, Yuka [1 ]
Sudoh, Katsuhito [1 ,2 ]
Sakti, Sakriani [1 ,2 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol, Ikoma, Nara, Japan
[2] RIKEN Ctr Adv Intelligence Project AIP, Tokyo, Japan
来源
关键词
end-to-end speech translation; multi-task learning; spoken language translation;
D O I
10.21437/Interspeech.2021-1105
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end speech translation (ST) translates source language speech directly into target language without an intermediate automatic speech recognition (ASR) output, as in a cascading approach. End-to-end ST has the advantage of avoiding error propagation from the intermediate ASR results, but its performance still lags behind the cascading approach. A recent effort to increase performance is multi-task learning using an auxiliary task of ASR. However, previous multi-task learning for end-to-end ST using cross entropy (CE) loss in ASR-task targets one-hot references and does not consider ASR confusion. In this study, we propose a novel end-to-end ST training method using ASR loss against ASR posterior distributions given by a pre-trained model, which we call ASR posterior-based loss. The proposed method is expected to consider possible ASR confusion due to competing hypotheses with similar pronunciations. The proposed method demonstrated better BLEU results in our Fisher Spanish-to-English translation experiments than the baseline with standard CE loss with label smoothing.
引用
收藏
页码:2272 / 2276
页数:5
相关论文
共 50 条
  • [1] Rethinking and Improving Multi-task Learning for End-to-end Speech Translation
    Zhang, Yuhao
    Xu, Chen
    Li, Bei
    Chen, Hao
    Xiao, Tong
    Zhang, Chunliang
    Zhu, Jingbo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10753 - 10765
  • [2] Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution
    Ko, Yuka
    Sudoh, Katsuhito
    Sakti, Sakriani
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (10) : 1322 - 1331
  • [3] End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1342 - 1355
  • [4] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    INTERSPEECH 2022, 2022, : 1387 - 1391
  • [5] Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
    Qiu, David
    He, Yanzhang
    Li, Qiujia
    Zhang, Yu
    Gao, Liangliang
    McGraw, Ian
    INTERSPEECH 2021, 2021, : 4074 - 4078
  • [6] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [7] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
    Kim, Suyoun
    Hori, Takaaki
    Watanabe, Shinji
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
  • [8] SPEECH ENHANCEMENT AIDED END-TO-END MULTI-TASK LEARNING FOR VOICE ACTIVITY DETECTION
    Tan, Xu
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6823 - 6827
  • [9] Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition
    Kurata, Gakuto
    Audhkhasi, Kartik
    INTERSPEECH 2019, 2019, : 1636 - 1640
  • [10] End-to-end multi-task optimization model for task-based dialogue systems
    Zhao F.
    Qiu M.
    Li X.
    Sun Y.
    Yang Z.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (11): : 3592 - 3599