Stochastic dynamic programming with factored representations

被引:171
作者
Boutilier, C [1 ]
Dearden, R
Goldszmidt, M
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3H5, Canada
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
[3] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
加拿大自然科学与工程研究理事会;
关键词
decision-theoretic planning; Markov decision processes; Bayesian networks; regression; decision trees; abstraction;
D O I
10.1016/S0004-3702(00)00033-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Markov decision processes (MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decision-tree representation rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decision-tree representations of policies and value functions. This generally obviates the need for state-by-state computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decision-theoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems, showing significant savings for certain types of domains. We also identify certain classes of problems for which this technique fails to perform well and suggest extensions and related ideas that may prove useful in such circumstances. We also briefly describe an approximation scheme based on this approach. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:49 / 107
页数:59
相关论文
共 83 条
[1]  
[Anonymous], CS9609 BROWN U DEP C
[2]  
[Anonymous], P 10 EUR C MACH LEAR
[3]  
Atkeson CG, 1997, ARTIF INTELL REV, V11, P75, DOI 10.1023/A:1006511328852
[4]  
BAHAR RI, 1993, 1993 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN - DIGEST OF TECHNICAL PAPERS, P188, DOI 10.1109/ICCAD.1993.580054
[5]  
Bellman R., 1957, DYNAMIC PROGRAMMING
[6]  
Bertsekas D. P., 1987, DYNAMIC PROGRAMMING
[7]  
Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[8]   ADAPTIVE AGGREGATION METHODS FOR INFINITE HORIZON DYNAMIC-PROGRAMMING [J].
BERTSEKAS, DP ;
CASTANON, DA .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1989, 34 (06) :589-598
[9]  
BOHANEC M, 1994, MACH LEARN, V15, P223, DOI 10.1023/A:1022685808937
[10]  
Boutilier C, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P1168