Strategies for improving low resource speech to text translation relying on pre-trained ASR models

被引:0
|
作者
Kesiraju, Santosh [1 ]
Sarvas, Marek [1 ]
Pavlicek, Tomas [2 ]
Macaire, Cecile [3 ]
Ciuba, Alejandro [4 ]
机构
[1] Brno Univ Technol, Speech FIT, Brno, Czech Republic
[2] Phonexia, Brno, Czech Republic
[3] Univ Grenoble Alpes, Grenoble, France
[4] Univ Pittsburgh, Pittsburgh, PA 15260 USA
来源
INTERSPEECH 2023 | 2023年
基金
美国国家科学基金会; 欧盟地平线“2020”;
关键词
speech translation; low-resource; multilingual; speech recognition;
D O I
10.21437/Interspeech.2023-2506
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyper-parameters) that contribute the most for improvements in lowresource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.
引用
收藏
页码:2148 / 2152
页数:5
相关论文
共 16 条
  • [1] EFFICIENT UTILIZATION OF LARGE PRE-TRAINED MODELS FOR LOW RESOURCE ASR
    Vieting, Peter
    Luescher, Christoph
    Dierkes, Julian
    Schlueter, Ralf
    Ney, Hermann
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [2] ANALYZING ASR PRETRAINING FOR LOW-RESOURCE SPEECH-TO-TEXT TRANSLATION
    Stoian, Mihaela C.
    Bansal, Sameer
    Goldwater, Sharon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7909 - 7913
  • [3] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
    Izsak, Peter
    Guskin, Shira
    Wasserblat, Moshe
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 44 - 47
  • [4] Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing
    Mi, Chenggang
    Xie, Lei
    Zhang, Yanning
    NEURAL NETWORKS, 2022, 148 : 194 - 205
  • [5] Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
  • [6] A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Peng, Yifan
    Arora, Siddhant
    Higuchi, Yosuke
    Ueda, Yushi
    Kumar, Sujay
    Ganesan, Karthik
    Dalmia, Siddharth
    Chang, Xuankai
    Watanabe, Shinji
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 406 - 413
  • [7] IMPROVING CTC-BASED SPEECH RECOGNITION VIA KNOWLEDGE TRANSFERRING FROM PRE-TRAINED LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    Cheng, Gaofeng
    Xu, Ji
    Zhang, Pengyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8517 - 8521
  • [8] Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective
    Liu, Ke
    Wei, Jiwei
    Zou, Jie
    Wang, Peng
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10623 - 10636
  • [9] Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions
    van Vuren, Joshua Jansen
    Niesler, Thomas
    INTERSPEECH 2023, 2023, : 1439 - 1443
  • [10] BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification
    Dhananjaya, Vinura
    Demotte, Piyumal
    Ranathunga, Surangika
    Jayasena, Sanath
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7377 - 7385