End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

被引:0
|
作者
Gallego, Gerard, I [1 ]
Tsiamas, Ioannis [1 ]
Escolano, Carlos [1 ]
Fonollosa, Jose A. R. [1 ]
Costa-jussa, Marta R. [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 50 条
  • [1] The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
    Xu, Chen
    Liu, Xiaoqian
    Liu, Xiaowen
    Wang, Laohu
    Huang, Canan
    Xiao, Tong
    Zhu, Jingbo
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 92 - 99
  • [2] Edinburgh's End-to-End Multilingual Speech Translation System for IWSLT 2021
    Zhang, Biao
    Sennrich, Rico
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 160 - 168
  • [3] PEIT: Bridging the Modality Gap with Pre-trained Models for End-to-End Image Translation
    Zhu, Shaolin
    Li, Shangjie
    Lei, Yikun
    Xiong, Deyi
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13433 - 13447
  • [4] SPEECH SENTIMENT ANALYSIS VIA PRE-TRAINED FEATURES FROM END-TO-END ASR MODELS
    Lu, Zhiyun
    Cao, Liangliang
    Zhang, Yu
    Chiu, Chung-Cheng
    Fan, James
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7149 - 7153
  • [5] Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
    Matsuura, Kohei
    Ashihara, Takanori
    Moriya, Takafumi
    Tanaka, Tomohiro
    Kano, Takatomo
    Ogawa, Atsunori
    Delcroix, Marc
    INTERSPEECH 2023, 2023, : 2943 - 2947
  • [6] End-to-end speech topic classification based on pre-trained model Wavlm
    Cao, Tengfei
    He, Liang
    Niu, Fangjing
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 369 - 373
  • [7] IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Yang, Zehui
    Watanabe, Shinji
    Higuchi, Yosuke
    Cheng, Gaofeng
    Zhang, Pengyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8522 - 8526
  • [8] BERTIVITS: The Posterior Encoder Fusion of Pre-Trained Models and Residual Skip Connections for End-to-End Speech Synthesis
    Wang, Zirui
    Song, Minqi
    Zhou, Dongbo
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [9] SRPOL's System for the IWSLT 2020 End-to-End Speech Translation Task
    Potapczyk, Tomasz
    Przybysz, Pawel
    17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 89 - 94
  • [10] End-to-End Pre-trained Dialogue System for Automatic Diagnosis
    Wang, Yuan
    Li, Zekun
    Zeng, Leilei
    Zhao, Tingting
    CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 82 - 91