Partially observable Markov decision processes for spoken dialog systems

被引:413
作者
Williams, Jason D.
Young, Steve
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
spoken dialog system; dialog management; planning under uncertainty; user modelling; Markov decision processes; decision theory;
D O I
10.1016/j.csl.2006.06.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a spoken dialog system, determining which action a machine should take in a given situation is a difficult problem because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Much of the research in spoken dialog systems centres on mitigating this uncertainty and recent work has focussed on three largely disparate techniques: parallel dialog state hypotheses, local use of confidence scores, and automated planning. While in isolation each of these approaches can improve action selection, taken together they currently lack a unified statistical framework that admits global optimization. In this paper we cast a spoken dialog system as a partially observable Markov decision process (POMDP). We show how this formulation unifies and extends existing techniques to form a single principled framework. A number of illustrations are used to show qualitatively the potential benefits of POMDPs compared to existing techniques, and empirical results from dialog simulations are presented which demonstrate significant quantitative gains. Finally, some of the key challenges to advancing this method - in particular scalability - are briefly outlined. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:393 / 422
页数:30
相关论文
共 60 条
[1]  
[Anonymous], P INT JOINT C ART IN
[2]  
[Anonymous], 2000, ENVIRONMENT
[3]  
BOHUS D, 2002, CS190 CARN MELL U
[4]  
BOHUS D, 2005, P SIGDIAL WORKSH DIS
[5]  
BOHUS D, 2001, P EUR AALB DENM
[6]  
BOHUS D, 2005, P EUR LISB
[7]  
Cassandra A. R., 1994, P C ART INT AAAI SEA
[8]  
Christopher JohnCornish Hella by Watkins., 1989, Learning from delayed rewards
[9]  
DENECKE M, 2004, P INT C SPEECH LANG
[10]  
DENG Y, 2003, P EUR GEN