Controlled Markov Processes With Safety State Constraints

被引：23

作者：

El Chamie, Mahmoud ^{[1
]}

Yu, Yue ^{[2
]}

Acikmese, Behcet ^{[2
]}

Ono, Masahiro ^{[3
]}

机构：

[1] United Technol Res Ctr, Syst Dept, E Hartford, CT 06108 USA

[2] Univ Washington, Dept Aeronaut & Astronaut, Seattle, WA 98195 USA

[3] CALTECH, NASAs, Jet Prop Lab, Pasadena, CA 91109 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2019年 / 64卷 / 03期

基金：

美国国家科学基金会;

关键词：

Markov processes; dynamic programming; agents and autonomous systems; stochastic optimal control; constrained control; Markov decision processes; controlled Markov chains; CHAINS;

D O I：

10.1109/TAC.2018.2849556

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.

引用

页码：1003 / 1018

页数：16

共 31 条

[1] Approximate Model Checking of Stochastic Hybrid Systems [J].

Abate, Alessandro ;

Katoen, Joost-Pieter ;

Lygeros, John ;

Prandini, Maria .

EUROPEAN JOURNAL OF CONTROL, 2010, 16 (06) :624-641

[2] Convex Necessary and Sufficient Conditions for Density Safety Constraints in Markov Chain Synthesis [J].

Acikmese, Behcet ;

Demir, Nazli ;

Harris, Matthew W. .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2015, 60 (10) :2813-2818

[3] Markov Chain Approach to Probabilistic Guidance for Swarms of Autonomous Agents [J].

Acikmese, Behcet ;

Bayard, David S. .

ASIAN JOURNAL OF CONTROL, 2015, 17 (04) :1105-1124

[4] MARKOV DECISION-PROBLEMS AND STATE-ACTION FREQUENCIES [J].

ALTMAN, E ;

SHWARTZ, A .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1991, 29 (04) :786-809

[5]

[Anonymous], P AIAA INF AER C AIA

[6]

[Anonymous], 2002, HDB MARKOV DECISION

[7]

[Anonymous], 1999, STOCH MODEL SER, DOI 10.1201/9781315140223

[8]

[Anonymous], RR398 FRENCH I RES C

[9]

[Anonymous], 2008, REPRESENTATION MIND

[10]

[Anonymous], 2016, DYNAMIC PROGRAMMING

← 1 2 3 4 →