The nth-order bias optimality for multichain Markov decision processes

被引：24

作者：

Cao, Xi-Ren ^{[1
]}

Zhang, Junyu ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2008年 / 53卷 / 02期

关键词：

average optimality; bias optimality; discrete-event systems; Markov decision processes (MDPs); nth-bias optimality; nth potentials; policy iteration;

D O I：

10.1109/TAC.2007.915168

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a new approach to the theory of finite multichain Markov decision processes (MDPs) with different performance optimization criteria. We first propose the concept of nth-order bias; then, using the average reward and bias difference formulas derived in this paper, we develop an optimization theory for finite MDPs that covers a complete spectrum from average optimality, bias optimality, to all high-order bias optimality, in a unified way. The approach is simple, direct, natural, and intuitive; it depends neither on Laurent series expansion nor on discounted MDPs. We also propose one-phase policy iteration algorithms for bias and high-order bias optimal policies, which are more efficient than the two-phase algorithms in the literature. Furthermore, we derive high-order bias optimality equations. This research is a part of our effort in developing sensitivity-based learning and optimization theory.

引用

页码：496 / 508

页数：13

共 22 条

[1]

Bertsekas DP, 2012, DYNAMIC PROGRAMMING, V2

[2] DISCRETE DYNAMIC-PROGRAMMING [J].

BLACKWELL, D .

ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (02) :719-&

[3] Basic ideas for event-based optimization of Markov systems [J].

Cao, XR .

DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2005, 15 (02) :169-197

[4] The potential structure of sample paths and performance sensitivities of Markov systems [J].

Cao, XR .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (12) :2129-2142

[5] A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases [J].

Cao, XR ;

Guo, XP .

AUTOMATICA, 2004, 40 (10) :1749-1759

[6] A unified approach to Markov decision problems and performance sensitivity analysis [J].

Cao, XR .

AUTOMATICA, 2000, 36 (05) :771-774

[7] From perturbation analysis to Markov decision processes and reinforcement learning [J].

Cao, XR .

DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2003, 13 (1-2) :9-39

[8] Approximate receding horizon approach for Markov decision processes: average reward case [J].

Chang, HS ;

Marcus, SI .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2003, 286 (02) :636-651

[9]

CHUNG KL, 1960, MARKOV CHAINS WITH S

[10]

Feinberg EA, 2002, HDB MARKOV DECISION

← 1 2 3 →