A Fast Approximation Method for Partially Observable Markov Decision Processes

被引：0

作者：

LIU Bingbing ^{[1
]}

KANG Yu ^{[1
]}

JIANG Xiaofeng ^{[1
]}

QIN Jiahu ^{[1
]}

机构：

[1] Department of Automation, University of Science and Technology of China

来源：

JournalofSystemsScience&Complexity | 2018年 / 31卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Lower bound; point-based; POMDP;

D O I：

暂无

中图分类号：

O211.62 [马尔可夫过程];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

This paper develops a new lower bound method for POMDPs that approximates the update of a belief by the update of its non-zero states. It uses the underlying MDP to explore the optimal reachable state space from initial belief and select actions during value iterations, which significantly accelerates the convergence speed. Also, an algorithm which collects and prunes belief points based on the upper and lower bounds is presented, and experimental results show that it outperforms some of the state-of-art point-based algorithms.

引用

页码：1423 / 1436

页数：14

共 9 条

[1] Training Beam Sequence Design for Millimeter-Wave MIMO Systems: A POMDP Framework.[J] . Junyeong Seo,Youngchul Sung,Gilwon Lee,Donggun Kim. IEEE Trans. Signal Processing . 2016 (5)
[2] Energy Efficient Execution of POMDP Policies.[J] . Grze? Marek,Poupart Pascal,Yang Xiao,Hoey Jesse. IEEE transactions on cybernetics . 2015 (11)
[3] Extending the Applicability of POMDP Solutions to Robotic Tasks
Grady, Devin K.
Moll, Mark
Kavraki, Lydia E.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2015, 31 (04) : 948 - 961
[4] POMDP-based control of workflows for crowdsourcing[J] . Peng Dai,Christopher H. Lin,Mausam,Daniel S. Weld. Artificial Intelligence . 2013
[5] A survey of point-based POMDP solvers
Shani, Guy
Pineau, Joelle
Kaplow, Robert
[J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2013, 27 (01) : 1 - 51
[6] THE COMPLEXITY OF MARKOV DECISION-PROCESSES
PAPADIMITRIOU, CH
TSITSIKLIS, JN
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (03) : 441 - 450
[7] The Optimal Control of Partially Observable Markov Processes Over a Finite Horizon[J] . Richard D. Smallwood,Edward J. Sondik. Operations Research . 1973 (5)
[8] Planning and control in stochastic domains with imperfect information .2 Hauskrecht M. Massachusetts Institute of Technology . 1997
[9] Sarsop:Efficient Point-Based Pomdp Planning by Approximating Optimally Reachable Belief Spaces .2 Kurniawati H,Hsu D,Lee W S. MIT Press . 2008

← 1 →