Dynamic Optimization and Learning for Renewal Systems

被引：80

作者：

Neely, Michael J. ^{[1
]}

机构：

[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2013年 / 58卷 / 01期

基金：

美国国家科学基金会;

关键词：

Markov decision problems (MDPs); stochastic processes; ALGORITHM;

D O I：

10.1109/TAC.2012.2204831

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper considers optimization of time averages in systems with variable length renewal frames. Applications include power-aware and profit-aware scheduling in wireless networks, peer-to-peer networks, and transportation systems. Every frame, a new policy is implemented that affects the frame size and that creates a vector of attributes. The policy can be a single decision in response to a random event observed on the frame, or a sequence of such decisions. The goal is to choose policies on each frame in order to maximize a time average of one attribute, subject to additional time average constraints on the others. Two algorithms are developed, both based on Lyapunov optimization concepts. The first makes decisions to minimize a "drift-plus-penalty" ratio over each frame. The second is similar but does not involve a ratio. For systems that make only a single decision on each frame, both algorithms are shown to learn efficient behavior without a-priori statistical knowledge. The framework is also applicable to large classes of constrained Markov decision problems. Such problems are reduced to finding an approximate solution to a simpler unconstrained stochastic shortest path problem on each frame. Approximations for the simpler problem may still suffer from a curse of dimensionality for systems with large state space. However, our results are compatible with any approximation method, and demonstrate an explicit tradeoff between performance and convergence time.

引用

页码：32 / 46

页数：15

共 34 条

[1]

Abad FJV, 2003, 42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, P2823

[2]

[Anonymous], 1999, STOCH MODEL SER, DOI 10.1201/9781315140223

[3]

[Anonymous], 1996, Neuro-dynamic programming

[4]

[Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

[5]

Bertsekas D. P., 1995, Dynamic programming and optimal control

[6] An actor-critic algorithm for constrained Markov decision processes [J].

Borkar, VS .

SYSTEMS & CONTROL LETTERS, 2005, 54 (03) :207-213

[7]

Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441

[8]

Chen M., 2012, P 46 C INF SCI SYST

[9]

Dai F., 2011, PROC ASIA PAC POWER, P1

[10] Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control [J].

Djonin, Dejan V. ;

Krishnamurthy, Vikram .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) :2170-2181

← 1 2 3 4 →