Capture Salient Historical Information: A Fast and Accurate Non-autoregressive Model for Multi-turn Spoken Language Understanding

被引:3
作者
Cheng, Lizhi [1 ]
Jia, Weijia [2 ]
Yang, Wenmian [3 ]
机构
[1] Shanghai Jiao Tong Univ, 800 Dongchuan Rd, Shanghai 200240, Peoples R China
[2] Beijing Normal Univ Zhuhai, BNU UIC Inst Artificial Intelligence & Future Net, Guangdong Key Lab AI & Multimodal Data Proc, BNU HKBU United Int Coll, 2000 Jintong Rd, Zhuhai 519087, Guangdong, Peoples R China
[3] Nanyang Technol Univ, 50 Nanyang Ave, Singapore 639798, Singapore
关键词
Multi-task learning; spoken interfaces; task-oriented dialogue system;
D O I
10.1145/3545800
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference facing the impatience of human users. Existing work increases inference speed by designing non-autoregressive models for single-turn SLU tasks but fails to apply tomulti-turn SLU in confronting the dialogue history. The intuitive idea is to concatenate all historical utterances and utilize the non-autoregressive models directly. However, this approach seriously misses the salient historical information and suffers from the uncoordinated-slot problems. To overcome those shortcomings, we propose a novel model for multi-turn SLU named Salient History Attention with Layer-Refined Transformer (SHA-LRT), which comprises a SHA module, a Layer-Refined Mechanism (LRM), and a Slot Label Generation (SLG) task. SHA captures salient historical information for the current dialogue from both historical utterances and results via a well-designed history-attention mechanism. LRM predicts preliminary SLU results from Transformer's middle states and utilizes them to guide the final prediction, and SLG obtains the sequential dependency information for the non-autoregressive encoder. Experiments on public datasets indicate that our model significantly improves multi-turn SLU performance (17.5% on Overall) with accelerating (nearly 15 times) the inference process over the state-of-the-art baseline as well as effective on the single-turn SLU tasks.
引用
收藏
页数:32
相关论文
共 55 条
  • [1] Ba J.L., 2016, arXiv
  • [2] Bai H, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5448
  • [3] Bapna A, 2017, 18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), P103
  • [4] Towards Zero-Shot Frame Semantic Parsing for Domain Scaling
    Bapna, Ankur
    Tur, Gokhan
    Hakkani-Tur, Dilek
    Heck, Larry
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2476 - 2480
  • [5] Bordes A., 2016, arXiv
  • [6] Chen Q, 2019, Arxiv, DOI arXiv:1902.10909
  • [7] End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding
    Chen, Yun-Nung
    Hakkani-Tur, Dilek
    Tur, Gokhan
    Gao, Jianfeng
    Deng, Li
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3245 - 3249
  • [8] Cheng Lizhi, 2021, P 30 ACM INT C INFOR
  • [9] Cheng Lizhi, 2021, P IEEE INT C MULTIME
  • [10] Coucke A, 2018, Arxiv, DOI arXiv:1805.10190