AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning

被引：26

作者：

Chen, Lu ^{[1
]}

Chen, Zhi ^{[1
]}

Tan, Bowen ^{[1
]}

Long, Sishan ^{[1
]}

Gasic, Milica ^{[2
]}

Yu, Kai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[2] Heinrich Heine Univ Dusseldorf, D-40225 Dusseldorf, Germany

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 09期

关键词：

Dialogue policy; deep reinforcement learning; graph neural networks; policy adaptation; transfer learning; STATE; SYSTEMS;

D O I：

10.1109/TASLP.2019.2919872

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: first, many DRL-based policies are not sample efficient; and second, most models do not have the capability of policy transfer between different domains. In this paper, we propose a universal framework, AgentGraph, to tackle these two problems. The proposed AgentGraph is the combination of graph neural network (GNN) based architecture and DRL-based algorithm. It can be regarded as one of the multi-agent reinforcement learning approaches. Each agent corresponds to a node in a graph, which is defined according to the dialogue domain ontology. When making a decision, each agent can communicate with its neighbors on the graph. Under AgentGraph framework, we further propose dual GNN-based dialogue policy, which implicitly decomposes the decision in each turn into a high-level global decision and a low-level local decision. Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark. Moreover, when transferred from the source task to a target task, these models not only have acceptable initial performance but also converge much faster on the target task.

引用

页码：1378 / 1391

页数：14

共 53 条

[1] Allamanis M, 2017, P INT C LEARN REPR P INT C LEARN REPR
[2] [Anonymous], 2017, EMNLP
[3] [Anonymous], 2013, P SIGDIAL 2013 C
[4] [Anonymous], 2016, PROC INT C MACH LEAR
[5] [Anonymous], 2018, P INT C LEARN REPR
[6] [Anonymous], 2017, P 2017 C EM METH NAT
[7] [Anonymous], 1993, THESIS
[8] [Anonymous], 2015, P MACH LEARN SPOK LA
[9] [Anonymous], 2018, P 2018 C N AM CHAPTE, DOI DOI 10.18653/V1/N18-2112
[10] [Anonymous], P SIGDIAL WORKSH DIS

← 1 2 3 4 5 6 →