Dynamic Optimization and Learning for Renewal Systems

被引:80
作者
Neely, Michael J. [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
基金
美国国家科学基金会;
关键词
Markov decision problems (MDPs); stochastic processes; ALGORITHM;
D O I
10.1109/TAC.2012.2204831
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper considers optimization of time averages in systems with variable length renewal frames. Applications include power-aware and profit-aware scheduling in wireless networks, peer-to-peer networks, and transportation systems. Every frame, a new policy is implemented that affects the frame size and that creates a vector of attributes. The policy can be a single decision in response to a random event observed on the frame, or a sequence of such decisions. The goal is to choose policies on each frame in order to maximize a time average of one attribute, subject to additional time average constraints on the others. Two algorithms are developed, both based on Lyapunov optimization concepts. The first makes decisions to minimize a "drift-plus-penalty" ratio over each frame. The second is similar but does not involve a ratio. For systems that make only a single decision on each frame, both algorithms are shown to learn efficient behavior without a-priori statistical knowledge. The framework is also applicable to large classes of constrained Markov decision problems. Such problems are reduced to finding an approximate solution to a simpler unconstrained stochastic shortest path problem on each frame. Approximations for the simpler problem may still suffer from a curse of dimensionality for systems with large state space. However, our results are compatible with any approximation method, and demonstrate an explicit tradeoff between performance and convergence time.
引用
收藏
页码:32 / 46
页数:15
相关论文
共 34 条
[1]  
Abad FJV, 2003, 42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, P2823
[2]  
[Anonymous], 1999, STOCH MODEL SER, DOI 10.1201/9781315140223
[3]  
[Anonymous], 1996, Neuro-dynamic programming
[4]  
[Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
[5]  
Bertsekas D. P., 1995, Dynamic programming and optimal control
[6]   An actor-critic algorithm for constrained Markov decision processes [J].
Borkar, VS .
SYSTEMS & CONTROL LETTERS, 2005, 54 (03) :207-213
[7]  
Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
[8]  
Chen M., 2012, P 46 C INF SCI SYST
[9]  
Dai F., 2011, PROC ASIA PAC POWER, P1
[10]   Q-learning algorithms for constrained Markov decision processes with randomized monotone policies:: Application to MIMO transmission control [J].
Djonin, Dejan V. ;
Krishnamurthy, Vikram .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (05) :2170-2181