Maximizing the set of recurrent states of an MDP subject to convex constraints

被引:3
作者
Arvelo, Eduardo [1 ]
Martins, Nuno C. [1 ]
机构
[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
Maximum entropy; Markov decision problems; Markov models; Convex optimization; Optimal control; CONTROLLED MARKOV-CHAINS;
D O I
10.1016/j.automatica.2014.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on the design of time-homogeneous fully observed Markov decision processes (MDPs), with finite state and action spaces. The main objective is to obtain policies that generate the maximal set of recurrent states, subject to convex constraints on the set of invariant probability mass functions. We propose a design method that relies on a finitely parametrized convex program inspired on principles of entropy maximization. A numerical example is provided to illustrate these ideas. (C) 2014 Published by Elsevier Ltd.
引用
收藏
页码:994 / 998
页数:5
相关论文
共 21 条
  • [1] [Anonymous], 2011, CVX MATLAB SOFTWARE
  • [2] [Anonymous], 1990, Introduction to Algorithms
  • [3] [Anonymous], CONSTRAINED MARKOV D
  • [4] [Anonymous], 2002, Internat. Ser. Oper. Res. Management Sci.
  • [5] Control of Markov chains with safety bound's
    Arapostathis, A
    Kumar, R
    Hsu, SP
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2005, 2 (04) : 333 - 343
  • [6] Controlled Markov chains with safety upper bound
    Arapostathis, A
    Kumar, R
    Tangirala, S
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (07) : 1230 - 1234
  • [7] DISCRETE-TIME CONTROLLED MARKOV-PROCESSES WITH AVERAGE COST CRITERION - A SURVEY
    ARAPOSTATHIS, A
    BORKAR, VS
    FERNANDEZGAUCHERAND, E
    GHOSH, MK
    MARCUS, SI
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (02) : 282 - 344
  • [8] Bertsekas D. P., 2005, DYNAMIC PROGRAMMING, V1
  • [9] CONTROLLED MARKOV-CHAINS WITH CONSTRAINTS
    BORKAR, VS
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1990, 15 : 405 - 413
  • [10] Csiszar I., 2004, Foundations and Trends in Communications and Information Theory, V1, P1, DOI 10.1561/0100000004