A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING

被引:8
作者
Peng, Yifan [1 ]
Arora, Siddhant [1 ]
Higuchi, Yosuke [1 ]
Ueda, Yushi [1 ]
Kumar, Sujay [1 ]
Ganesan, Karthik [1 ]
Dalmia, Siddharth [1 ]
Chang, Xuankai [1 ]
Watanabe, Shinji [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
基金
美国国家科学基金会;
关键词
spoken language understanding; low resource; pre-trained models;
D O I
10.1109/SLT54892.2023.10022399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively.
引用
收藏
页码:406 / 413
页数:8
相关论文
共 54 条
  • [1] Agrawal B, 2021, Arxiv, DOI arXiv:2011.09044
  • [2] ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET
    Arora, Siddhant
    Dalmia, Siddharth
    Denisov, Pavel
    Chang, Xuankai
    Ueda, Yushi
    Peng, Yifan
    Zhang, Yuekai
    Kumar, Sujay
    Ganesan, Karthik
    Yan, Brian
    Ngoc Thang Vu
    Black, Alan W.
    Watanabe, Shinji
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7167 - 7171
  • [3] Arora S, 2022, Arxiv, DOI arXiv:2207.06670
  • [4] Arora Siddhant, 2021, P INTERSPEECH
  • [5] Baevski Alexei, 2020, PROC NEURIPS
  • [6] Bastianelli E, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P7252
  • [7] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [8] Chang X., 2021, arXiv
  • [9] Chen EY, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P6549
  • [10] Chen G., 2021, arXiv