ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

被引:0
|
作者
Le, Chenyang [1 ,4 ]
Qian, Yao [2 ]
Zhou, Long [3 ]
Liu, Shujie [3 ]
Qian, Yanmin [1 ]
Zeng, Michael [2 ]
Huang, Xuedong [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Microsoft Cloud & AI, Redmond, WA USA
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Microsoft, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks. Particularly, we propose to incorporate cross-modality learning into transfer learning and conduct them simultaneously for downstream tasks in a multi-task learning manner. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks, achieving a new state-of-the-art average BLEU score of 31.5 on the multilingual speech to English text translation task for 21 languages, as measured on the public CoVoST2 evaluation set.(2)
引用
收藏
页数:12
相关论文
共 50 条
  • [41] End-to-End Simultaneous Speech Translation with Differentiable Segmentation
    Zhang, Shaolei
    Feng, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7659 - 7680
  • [42] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    INTERSPEECH 2022, 2022, : 121 - 125
  • [43] An End-to-End Chinese Speech Recognition Algorithm Integrating Language Model
    Lü, Kun-Ru
    Wu, Chun-Guo
    Liang, Yan-Chun
    Yuan, Yu-Ping
    Ren, Zhi-Min
    Zhou, You
    Shi, Xiao-Hu
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (11): : 2177 - 2185
  • [44] A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge
    Perero-Codosero, Juan M.
    Espinoza-Cuadros, Fernando M.
    Hernandez-Gomez, Luis A.
    APPLIED SCIENCES-BASEL, 2022, 12 (02):
  • [45] LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
    Chen, Xi
    Zhang, Songyang
    Bai, Qibing
    Chen, Kai
    Nakamura, Satoshi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6976 - 6987
  • [46] Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
    Matsuura, Kohei
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2622 - 2628
  • [47] Consecutive Decoding for Speech-to-text Translation
    Dong, Qianqian
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12738 - 12748
  • [48] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [49] EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION MODEL FOR ATYPICAL SPEECH
    Doshi, Rohan
    Chen, Youzheng
    Jiang, Liyang
    Zhang, Xia
    Biadsy, Fadi
    Ramabhadran, Bhuvana
    Chu, Fang
    Rosenberg, Andrew
    Moreno, Pedro J.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6988 - 6992
  • [50] TOWARDS UNSUPERVISED SPEECH-TO-TEXT TRANSLATION
    Chung, Yu-An
    Weng, Wei-Hung
    Tong, Schrasing
    Glass, James
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7170 - 7174