Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

被引:127
作者
Thomson, Blaise [1 ]
Young, Steve [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1TP, England
基金
英国工程与自然科学研究理事会;
关键词
Dialogue systems; Robustness; POMDP; Reinforcement learning;
D O I
10.1016/j.csl.2009.07.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of spoken dialogue management. However, exact belief state updates in a POMDP model are computationally intractable so approximate methods must be used. This paper presents a tractable method based on the loopy belief propagation algorithm. Various simplifications are made, which improve the efficiency significantly compared to the original algorithm as well as compared to other POMDP-based dialogue state updating approaches. A second contribution of this paper is a method for learning in spoken dialogue systems which uses a component-based policy with the episodic Natural Actor Critic algorithm. The framework proposed in this paper was tested on both simulations and in a user trial. Both indicated that using Bayesian updates of the dialogue state significantly outperforms traditional definitions of the dialogue state. Policy learning worked effectively and the learned policy outperformed all others on simulations. In user trials the learned policy was also competitive, although its optimality was less conclusive. Overall, the Bayesian update of dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based dialogue systems. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:562 / 588
页数:27
相关论文
共 36 条
[1]   Natural gradient works efficiently in learning [J].
Amari, S .
NEURAL COMPUTATION, 1998, 10 (02) :251-276
[2]  
[Anonymous], 2006, Pattern recognition and machine learning
[3]  
Boyen Xavier., 1998, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, P33
[4]  
BUI T, 2007, WORKSH KNOWL REAS PR, P34
[5]  
Heskes T., 2003, Advances in Neural Information Processing Systems, P359
[6]  
Horvitz E, 1999, CISM COUR L, P201
[7]   Planning and acting in partially observable stochastic domains [J].
Kaelbling, LP ;
Littman, ML ;
Cassandra, AR .
ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) :99-134
[8]   Factor graphs and the sum-product algorithm [J].
Kschischang, FR ;
Frey, BJ ;
Loeliger, HA .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (02) :498-519
[9]  
LEMON O, 2006, P EUR CHAPT ASS COMP
[10]   A stochastic model of human-machine interaction for learning dialog strategies [J].
Levin, E ;
Pieraccini, R ;
Eckert, W .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01) :11-23