A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization

被引：0

作者：

Zhang, Junyu ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China

来源：

PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT) | 2016年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we discuss a kind of partially observable Markov decision process (POMDP) problem by the event-based optimization which is proposed in [4]. A POMDP ([7] and [8]) is a generalization of a standard completely observable Markov decision process that allows imperfect information about states of the system. Policy iteration algorithms for POMDPs have proved to be impractical as it is very difficult to implement. Thus, most work with POMDPs has used value iteration. But for a special case of POMDP, we can formulate it to an MDP problem. Then we can use our sensitivity view to derive the corresponding average reward difference formula. Based on that and the idea of event-based optimization, we use a single sample path to estimate aggregated potentials. Then we develop policy iteration (PI) algorithms.

引用

页码：1522 / 1526

页数：5

共 14 条

[1] Event-based optimization of Markov systems
Cao, Xi-Ren
Zhang, Junyu
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2008, 53 (04) : 1076 - 1082
[2] The nth-order bias optimality for multichain Markov decision processes
Cao, Xi-Ren
Zhang, Junyu
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2008, 53 (02) : 496 - 508
[3] Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle
Cao, Xi-Ren
Wang, De-Xin
Qiu, Li
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (04) : 921 - 936
[4] Basic ideas for event-based optimization of Markov systems
Cao, XR
[J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2005, 15 (02): : 169 - 197
[5] The relations among potentials, perturbation analysis, and Markov decision processes
Cao, XR
[J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 1998, 8 (01): : 71 - 87
[6] Cheng H., 2014, ROB AUT ICRA 2014 IE
[7] Jaakkola T., 1995, Advances in Neural Information Processing Systems 7, P345
[8] Planning and acting in partially observable stochastic domains
Kaelbling, LP
Littman, ML
Cassandra, AR
[J]. ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) : 99 - 134
[9] Controlling the continuos positive airway pressure-device using partial observable Markov decision processes
Kreutz, C
Honerkamp, J
[J]. MODELLING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2005, : 273 - 286
[10] Littman M. L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P362

← 1 2 →