Continuous Real Time Dynamic Programming for Discrete and Continuous State MDPs

被引：1

作者：

Rocha Vianna, Luis Gustavo ^{[1
]}

Sanner, Scott ^{[2
,3
]}

de Barros, Leliane Nunes ^{[1
]}

机构：

[1] Univ Sao Paulo, Sao Paulo, Brazil

[2] NICTA, Canberra, ACT, Australia

[3] Australian Natl Univ, Canberra, ACT, Australia

来源：

2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS) | 2014年

关键词：

D O I：

10.1109/BRACIS.2014.34

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Applications of automated planning under uncertainty are often modelled as a discrete and continuous state Markov Decision Process (DC-MDP). Symbolic Dynamic Programming is the existent exact solution for DC-MDPs that uses the eXtended Algebraic Decision Diagrams (XADDs) to symbolically represent the state value function and that computes a complete state-space policy (which is very costly and limits solutions to problems with small size and depth). Real-Time Dynamic Programming (RTDP) is an efficient solution method for discrete state MDPs that provides a partial solution for a known initial state. In this paper we combine the RTDP solution with XADD symbolic representation and computation of the value function to propose the Continuous Real Time Dynamic Programming (CRTDP) algorithm. This novel planner uses heuristic search and symbolic generalisation to efficiently update the value function by regions. We show that using the initial state information greatly reduces the number of regions in the value function, therefore allowing CRTDP to solve DC-MDPs more efficiently than standard symbolic dynamic programming both in time and space required for the solution.

引用

页码：134 / 139

页数：6

共 15 条

[1]

BAHAR RI, 1993, 1993 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN - DIGEST OF TECHNICAL PAPERS, P188, DOI 10.1109/ICCAD.1993.580054

[2] LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].

BARTO, AG ;

BRADTKE, SJ ;

SINGH, SP .

ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138

[3]

Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics

[4]

Bonet B., 2003, Proceedings, Thirteenth International Conference on Automated Planning and Scheduling, P12

[5]

Boutilier C, 1999, J ARTIF INTELL RES, V11, P1

[6]

Bresina J. L., 2002, UNCERTAINTY ARTIFICI

[7]

Dean T., 1989, Computational Intelligence, V5, P142, DOI 10.1111/j.1467-8640.1989.tb00324.x

[8]

Feng Zhengzhu., 2004, UAI, P154

[9]

Kolobov A, 2012, AAAI C ART INT

[10] Solving factored MDPs with hybrid state and action variables [J].

Kveton, Branislav ;

Hauskrecht, Milos ;

Guestrin, Carlos .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 27 (153-201) :153-201

← 1 2 →