Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets

被引:61
作者
Henderson, James [1 ]
Lemon, Oliver [2 ]
Georgila, Kallirroi [2 ]
机构
[1] Univ Geneva, Dept Informat, CH-1227 Carouge, Switzerland
[2] Univ Edinburgh, Edinburgh EH8 9LW, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1162/coli.2008.07-028-R2-05-82
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a huge policy space. To address the problem that any fixed data set will only provide information about small portions of these state and policy spaces, we propose a hybrid model that combines reinforcement learning with supervised learning. The reinforcement learning is used to optimize a measure of dialogue reward, while the supervised learning is used to restrict the learned policy to the portions of these spaces for which we have data. We also use linear function approximation to address the need to generalize from a fixed amount of data to large state spaces. To demonstrate the effectiveness of this method on this challenging task, we trained this model on the COMMUNICATOR corpus, to which we have added annotations for user actions and Information States. When tested with a user simulation trained on a different part of the same data set, our hybrid model outperforms a pure supervised learning model and a pure reinforcement learning model. It also outperforms the hand-crafted systems on the COMMUNICATOR data, according to automatic evaluation measures, improving over the average COMMUNICATOR system policy by 10%. The proposed method will improve techniques for bootstrapping and automatic optimization of dialogue management policies from limited initial data sets.
引用
收藏
页码:487 / 511
页数:25
相关论文
共 50 条
[11]  
GABSDIL M, 2004, P 42 M ASS COMP LING, P344
[12]  
GEORGILA K, 2006, P 9 INT C SPOK LANG, P1065
[13]  
GEORGILA K, AUTOMATIC ANNO UNPUB
[14]  
Georgila Kallirroi, 2005, P 9 WORKSH SEM PRAGM, P61
[15]  
Georgila Kallirroi, 2005, INTERSPEECH 2005, P893
[16]  
Goddeau D, 2000, INT CONF ACOUST SPEE, P1233
[17]  
Henderson J., 2005, Proceedings of the Workshop on Knowledge and Reasoning in Practical Dialogue Systems, International Joint Conference on Artificial Intelligence, P68
[18]  
Larsson S., 2000, Natural Language Engineering, V6, P323, DOI 10.1017/S1351324900002539
[19]  
LEMON O, 2005, D41 TALK PROJ
[20]  
LEMON O, 2005, D42 TALK PROJ