Capture Salient Historical Information: A Fast and Accurate Non-autoregressive Model for Multi-turn Spoken Language Understanding

被引：3

作者：

Cheng, Lizhi ^{[1
]}

Jia, Weijia ^{[2
]}

Yang, Wenmian ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, 800 Dongchuan Rd, Shanghai 200240, Peoples R China

[2] Beijing Normal Univ Zhuhai, BNU UIC Inst Artificial Intelligence & Future Net, Guangdong Key Lab AI & Multimodal Data Proc, BNU HKBU United Int Coll, 2000 Jintong Rd, Zhuhai 519087, Guangdong, Peoples R China

[3] Nanyang Technol Univ, 50 Nanyang Ave, Singapore 639798, Singapore

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2023年 / 41卷 / 02期

关键词：

Multi-task learning; spoken interfaces; task-oriented dialogue system;

D O I：

10.1145/3545800

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference facing the impatience of human users. Existing work increases inference speed by designing non-autoregressive models for single-turn SLU tasks but fails to apply tomulti-turn SLU in confronting the dialogue history. The intuitive idea is to concatenate all historical utterances and utilize the non-autoregressive models directly. However, this approach seriously misses the salient historical information and suffers from the uncoordinated-slot problems. To overcome those shortcomings, we propose a novel model for multi-turn SLU named Salient History Attention with Layer-Refined Transformer (SHA-LRT), which comprises a SHA module, a Layer-Refined Mechanism (LRM), and a Slot Label Generation (SLG) task. SHA captures salient historical information for the current dialogue from both historical utterances and results via a well-designed history-attention mechanism. LRM predicts preliminary SLU results from Transformer's middle states and utilizes them to guide the final prediction, and SLG obtains the sequential dependency information for the non-autoregressive encoder. Experiments on public datasets indicate that our model significantly improves multi-turn SLU performance (17.5% on Overall) with accelerating (nearly 15 times) the inference process over the state-of-the-art baseline as well as effective on the single-turn SLU tasks.

引用

页数：32

共 55 条

[1] Ba J.L., 2016, arXiv
[2] Bai H, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5448
[3] Bapna A, 2017, 18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), P103
[4] Towards Zero-Shot Frame Semantic Parsing for Domain Scaling
Bapna, Ankur
Tur, Gokhan
Hakkani-Tur, Dilek
Heck, Larry
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2476 - 2480
[5] Bordes A., 2016, arXiv
[6] Chen Q, 2019, Arxiv, DOI arXiv:1902.10909
[7] End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding
Chen, Yun-Nung
Hakkani-Tur, Dilek
Tur, Gokhan
Gao, Jianfeng
Deng, Li
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3245 - 3249
[8] Cheng Lizhi, 2021, P 30 ACM INT C INFOR
[9] Cheng Lizhi, 2021, P IEEE INT C MULTIME
[10] Coucke A, 2018, Arxiv, DOI arXiv:1805.10190

← 1 2 3 4 5 6 →