Safe Reinforcement Learning Using Wasserstein Distributionally Robust MPC and Chance Constraint

被引：5

作者：

Kordabad, Arash Bahari ^{[1
]}

Wisniewski, Rafael ^{[2
]}

Gros, Sebastien ^{[1
]}

机构：

[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, N-7034 Trondheim, Norway

[2] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Safe reinforcement learning; model predictive control; distributionally robust optimization; chance constraint; conditional value at risk; Q-learning; OPTIMIZATION;

D O I：

10.1109/ACCESS.2022.3228922

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we address the chance-constrained safe Reinforcement Learning (RL) problem using the function approximators based on Stochastic Model Predictive Control (SMPC) and Distributionally Robust Model Predictive Control (DRMPC). We use Conditional Value at Risk (CVaR) to measure the probability of constraint violation and safety. In order to provide a safe policy by construction, we first propose using parameterized nonlinear DRMPC at each time step. DRMPC optimizes a finite-horizon cost function subject to the worst-case constraint violation in an ambiguity set. We use a statistical ball around the empirical distribution with a radius measured by the Wasserstein metric as the ambiguity set. Unlike the sample average approximation SMPC, DRMPC provides a probabilistic guarantee of the out-of-sample risk and requires lower samples from the disturbance. Then the Q-learning method is used to optimize the parameters in the DRMPC to achieve the best closed-loop performance. Wheeled Mobile Robot (WMR) path planning with obstacle avoidance will be considered to illustrate the efficiency of the proposed method.

引用

页码：130058 / 130067

页数：10

共 47 条

[1] Boyd S., 2004, CONVEX OPTIMIZATION
[2] Cheng R, 2019, AAAI CONF ARTIF INTE, P3387
[3] Chow Y., 2014, Adv. Neural Inf. Process. Syst, V27, P1
[4] Chow Y, 2018, ADV NEUR IN, V31
[5] Chow Yinlam, 2015, Advances in neural information processing systems, V28
[6] Distributionally Robust Chance Constrained Data-Enabled Predictive Control
Coulson, Jeremy
Lygeros, John
Dorfler, Florian
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3289 - 3304
[7] Dalal Gal, 2018, C LEARNING THEORY, P1199
[8] Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems
Delage, Erick
Ye, Yinyu
[J]. OPERATIONS RESEARCH, 2010, 58 (03) : 595 - 612
[9] Ambiguous chance constrained problems and robust optimization
Erdogan, E
Iyengar, G
[J]. MATHEMATICAL PROGRAMMING, 2006, 107 (1-2) : 37 - 61
[10] Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
Esfahani, Peyman Mohajerin
Kuhn, Daniel
[J]. MATHEMATICAL PROGRAMMING, 2018, 171 (1-2) : 115 - 166

← 1 2 3 4 5 →