TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING

被引：3

作者：

Sunder, Vishal ^{[1
]}

Thomas, Samuel ^{[2
]}

Kuo, Hong-Kwang J. ^{[2
]}

Ganhotra, Jatin ^{[2
]}

Kingsbury, Brian ^{[2
]}

Fosler-Lussier, Eric ^{[1
]}

机构：

[1] Ohio State Univ, Columbus, OH 43210 USA

[2] IBM Res AI, Yorktown Hts, NY USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

spoken dialog system; spoken language understanding; end-to-end systems;

D O I：

10.1109/ICASSP43922.2022.9747871

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Dialog history plays an important role in spoken language understanding (SLU) performance in a dialog system. For end-to-end (E2E) SLU, previous work has used dialog history in text form, which makes the model dependent on a cascaded automatic speech recognizer (ASR). This rescinds the benefits of an E2E system which is intended to be compact and robust to ASR errors. In this paper, we propose a hierarchical conversation model that is capable of directly using dialog history in speech form, making it fully E2E. We also distill semantic knowledge from the available gold conversation transcripts by jointly training a similar text-based conversation model with an explicit tying of acoustic and semantic embeddings. We also propose a novel technique that we call DropFrame to deal with the long training time incurred by adding dialog history in an E2E manner. On the HarperValleyBank dialog dataset, our E2E history integration outperforms a history independent baseline by 7.7% absolute F1 score on the task of dialog action recognition. Our model performs competitively with the state-of-the-art history based cascaded baseline, but uses 48% fewer parameters. In the absence of gold transcripts to fine-tune an ASR model, our model outperforms this baseline by a significant margin of 10% absolute F1 score.

引用

页码：7497 / 7501

页数：5

共 50 条

[1] Integrating Dialog History into End-to-End Spoken Language Understanding Systems
Ganhotra, Jatin
Thomas, Samuel
Kuo, Hong-Kwang J.
Joshi, Sachindra
Saon, George
Tuske, Zoltan
Kingsbury, Brian
INTERSPEECH 2021, 2021, : 1254 - 1258
[2] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Serdyuk, Dmitriy
Wang, Yongqiang
Fuegen, Christian
Kumar, Anuj
Liu, Baiyang
Bengio, Yoshua
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
[3] DIALOGUE HISTORY INTEGRATION INTO END-TO-END SIGNAL-TO-CONCEPT SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
Tomashenko, Natalia
Raymond, Christian
Caubriere, Antoine
De Mori, Renato
Esteve, Yannick
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8509 - 8513
[4] A Streaming End-to-End Framework For Spoken Language Understanding
Potdar, Nihal
Avila, Anderson R.
Xing, Chao
Wang, Dong
Cao, Yiran
Chen, Xiao
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
[5] Semantic Complexity in End-to-End Spoken Language Understanding
McKenna, Joseph P.
Choudhary, Samridhi
Saxon, Michael
Strimel, Grant P.
Mouchtaris, Athanasios
INTERSPEECH 2020, 2020, : 4273 - 4277
[6] WhiSLU: End-to-End Spoken Language Understanding with Whisper
Wang, Minghan
Li, Yinglu
Guo, Jiaxin
Qiao, Xiaosong
Li, Zongyao
Shang, Hengchao
Wei, Daimeng
Tao, Shimin
Zhang, Min
Yang, Hao
INTERSPEECH 2023, 2023, : 770 - 774
[7] End-to-End Spoken Language Understanding Without Full Transcripts
Kuo, Hong-Kwang J.
Tuske, Zoltan
Thomas, Samuel
Huang, Yinghui
Audhkhasi, Kartik
Kingsbury, Brian
Kurata, Gakuto
Kons, Zvi
Hoory, Ron
Lastras, Luis
INTERSPEECH 2020, 2020, : 906 - 910
[8] End-to-End Spoken Language Understanding for Generalized Voice Assistants
Saxon, Michael
Choudhary, Samridhi
McKenna, Joseph P.
Mouchtaris, Athanasios
INTERSPEECH 2021, 2021, : 4738 - 4742
[9] Exploring Transfer Learning For End-to-End Spoken Language Understanding
Rongali, Subendhu
Liu, Beiye
Cai, Liwei
Arkoudas, Konstantine
Su, Chengwei
Hamza, Wael
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13754 - 13761
[10] End-to-End Neural Transformer Based Spoken Language Understanding
Radfar, Martin
Mouchtaris, Athanasios
Kunzmann, Siegfried
INTERSPEECH 2020, 2020, : 866 - 870

← 1 2 3 4 5 →