A STUDY ON THE INTEGRATION OF PRE-TRAINED SSL, ASR, LM AND SLU MODELS FOR SPOKEN LANGUAGE UNDERSTANDING

被引：8

作者：

Peng, Yifan ^{[1
]}

Arora, Siddhant ^{[1
]}

Higuchi, Yosuke ^{[1
]}

Ueda, Yushi ^{[1
]}

Kumar, Sujay ^{[1
]}

Ganesan, Karthik ^{[1
]}

Dalmia, Siddharth ^{[1
]}

Chang, Xuankai ^{[1
]}

Watanabe, Shinji ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

基金：

美国国家科学基金会;

关键词：

spoken language understanding; low resource; pre-trained models;

D O I：

10.1109/SLT54892.2023.10022399

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively.

引用

页码：406 / 413

页数：8

共 54 条

[1] Agrawal B, 2021, Arxiv, DOI arXiv:2011.09044
[2] ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET
Arora, Siddhant
Dalmia, Siddharth
Denisov, Pavel
Chang, Xuankai
Ueda, Yushi
Peng, Yifan
Zhang, Yuekai
Kumar, Sujay
Ganesan, Karthik
Yan, Brian
Ngoc Thang Vu
Black, Alan W.
Watanabe, Shinji
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7167 - 7171
[3] Arora S, 2022, Arxiv, DOI arXiv:2207.06670
[4] Arora Siddhant, 2021, P INTERSPEECH
[5] Baevski Alexei, 2020, PROC NEURIPS
[6] Bastianelli E, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P7252
[7] IEMOCAP: interactive emotional dyadic motion capture database
Busso, Carlos
Bulut, Murtaza
Lee, Chi-Chun
Kazemzadeh, Abe
Mower, Emily
Kim, Samuel
Chang, Jeannette N.
Lee, Sungbok
Narayanan, Shrikanth S.
[J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
[8] Chang X., 2021, arXiv
[9] Chen EY, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P6549
[10] Chen G., 2021, arXiv

← 1 2 3 4 5 6 →