Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

被引：0

作者：

Hansen, Eric A. ^{[1
]}

Abdoulahi, Ibrahim ^{[1
]}

机构：

[1] Mississippi State Univ, Dept Comp Sci & Engn, Mississippi State, MS 39762 USA

来源：

PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2015年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fully observable decision-theoretic planning problems are commonly modeled as stochastic shortest path (SSP) problems. For this class of planning problems, heuristic search algorithms (including LAO*, RTDP, and related algorithms), as well as the value iteration algorithm on which they are based, lack an efficient test for convergence to an epsilon-optimal policy (except in the special case of discounting). We introduce a simple and efficient test for convergence that applies to SSP problems with positive action costs. The test can detect whether a policy is proper, that is, whether it achieves the goal state with probability 1. If proper, it gives error bounds that can be used to detect convergence to an epsilon-optimal solution. The convergence test incurs no extra overhead besides computing the Bellman residual, and the performance guarantee it provides substantially improves the utility of this class of planning algorithms.

引用

页码：3283 / 3290

页数：8

共 16 条

[1]

[Anonymous], 2006, ICAPS

[2]

[Anonymous], 2006, P 21 AAAI C ARTIFICI

[3]

[Anonymous], 2016, DYNAMIC PROGRAMMING

[4] LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].

BARTO, AG ;

BRADTKE, SJ ;

SINGH, SP .

ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138

[5]

Bertsekas D, 2012, DYNAMIC PROGRAMMING, V1

[6] AN ANALYSIS OF STOCHASTIC SHORTEST-PATH PROBLEMS [J].

BERTSEKAS, DP ;

TSITSIKLIS, JN .

MATHEMATICS OF OPERATIONS RESEARCH, 1991, 16 (03) :580-595

[7]

Bonet B., 2003, Proceedings, Thirteenth International Conference on Automated Planning and Scheduling, P12

[8]

Bonet B., 2003, IJCAI, P1233

[9] On the speed of convergence of value iteration on stochastic shortest-path problems [J].

Bonet, Blai .

MATHEMATICS OF OPERATIONS RESEARCH, 2007, 32 (02) :365-373

[10] Monitoring and control of anytime algorithms: A dynamic programming approach [J].

Hansen, EA ;

Zilberstein, S .

ARTIFICIAL INTELLIGENCE, 2001, 126 (1-2) :139-157

← 1 2 →