END-TO-END SPOKEN LANGUAGE UNDERSTANDING WITHOUT MATCHED LANGUAGE SPEECH MODEL PRETRAINING DATA

被引:0
|
作者
Price, Ryan [1 ]
机构
[1] Interactions LLC, Franklin, MA 02038 USA
关键词
spoken language understanding; end-to-end; data augmentation; multilingual; pretraining; RECOGNITION;
D O I
10.1109/icassp40776.2020.9054573
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In contrast to conventional approaches to spoken language understanding (SLU) that consist of cascading a speech recognizer with a natural language understanding component, end-to-end (E2E) approaches for SLU infer semantics directly from the speech signal without processing it through separate subsystems. Pretraining part of the E2E models for speech recognition before finetuning the entire model for the target SLU task has proven to be an effective method to address the increased data requirements of E2E SLU models. However, transcribed corpora in the target language and domain may not always be available for pretraining an E2E SLU model. This paper proposes two strategies to improve the performance of E2E SLU models in scenarios where transcribed data for pretraining in the target language is unavailable: multilingual pretraining with mismatched languages and data augmentation using SpecAugment[1]. We demonstrate the effectiveness of these two methods for E2E SLU on two datasets, including one recently released publicly available dataset where we surpass the best previously published result despite not using any matched language data for pretraining.
引用
收藏
页码:7979 / 7983
页数:5
相关论文
共 50 条
  • [1] END-to-END Cross-Lingual Spoken Language Understanding Model with Multilingual Pretraining
    Zhang, Xianwei
    He, Liang
    INTERSPEECH 2021, 2021, : 4728 - 4732
  • [2] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    INTERSPEECH 2019, 2019, : 814 - 818
  • [3] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    INTERSPEECH 2020, 2020, : 906 - 910
  • [4] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Qian, Yao
    Bianv, Ximo
    Shi, Yu
    Kanda, Naoyuki
    Shen, Leo
    Xiao, Zhen
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
  • [5] A DATA EFFICIENT END-TO-END SPOKEN LANGUAGE UNDERSTANDING ARCHITECTURE
    Dinarelli, Marco
    Kapoor, Nikita
    Jabaian, Bassam
    Besacier, Laurent
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8519 - 8523
  • [6] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [7] USING SPEECH SYNTHESIS TO TRAIN END-TO-END SPOKEN LANGUAGE UNDERSTANDING MODELS
    Lugosch, Loren
    Meyer, Brett H.
    Nowrouzezahrai, Derek
    Ravanelli, Mirco
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8499 - 8503
  • [8] Confidence measure for speech-to-concept end-to-end spoken language understanding
    Caubriere, Antoine
    Esteve, Yannick
    Laurent, Antoine
    Morin, Emmanuel
    INTERSPEECH 2020, 2020, : 1590 - 1594
  • [9] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [10] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    INTERSPEECH 2020, 2020, : 4273 - 4277