Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

被引：70

作者：

Amato, Christopher ^{[1
]}

Bernstein, Daniel S. ^{[1
]}

Zilberstein, Shlomo ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2010年 / 21卷 / 03期

基金：

美国国家科学基金会;

关键词：

Decision theory; Multiagent systems; Planning under uncertainty; POMDPs; DEC-POMDPs;

D O I：

10.1007/s10458-009-9103-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their high computational complexity, however, presents an important research challenge. One way to address the intractable memory requirements of current algorithms is based on representing agent policies as finite-state controllers. Using this representation, we propose a new approach that formulates the problem as a nonlinear program, which defines an optimal policy of a desired size for each agent. This new formulation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DEC-POMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an off-the-shelf optimization method are competitive with state-of-the-art POMDP algorithms and outperform state-of-the-art DEC-POMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods.

引用

页码：293 / 320

页数：28

共 35 条

[1]

Amato C, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2417

[2]

[Anonymous], 1996, Global Optimization. Deterministic Approaches

[3]

[Anonymous], THESIS BROWN U PROVI

[4]

[Anonymous], 1994, Machine Learning, DOI DOI 10.1016/C2009-0-27542-8

[5]

[Anonymous], P 7 INT JOINT C AUT

[6] Solving transition independent decentralized Markov decision processes [J].

Becker, R ;

Zilberstein, S ;

Lesser, V ;

Goldman, CV .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 :423-455

[7]

Bernstein DS, 2005, 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), P1287

[8] Policy Iteration for Decentralized Control of Markov Decision Processes [J].

Bernstein, Daniel S. ;

Amato, Christopher ;

Hansen, Eric A. ;

Zilberstein, Shlomo .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 :89-132

[9]

Bertsekas D. P., 2004, Nonlinear Programming

[10]

Cassandra Anthony R., 1998, AAAI FALL S PLANN PO

← 1 2 3 4 →