Global Convergence of Policy Gradient Primal-Dual Methods for Risk-Constrained LQRs

被引:14
作者
Zhao, Feiran [1 ,2 ]
You, Keyou [1 ,2 ]
Basar, Tamer [3 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, BNRist, Beijing 100084, Peoples R China
[3] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
基金
中国国家自然科学基金;
关键词
Optimization; Convergence; Costs; Optimal control; Lagrangian functions; Trajectory; Search problems; Gradient descent; policy optimization (PO); reinforcement learning; risk-constrained linear quadratic regulator (RC-LQR); stochastic control; ACTOR-CRITIC ALGORITHM; SYSTEMS;
D O I
10.1109/TAC.2023.3234176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While the techniques in optimal control theory are often model-based, the policy optimization (PO) approach directly optimizes the performance metric of interest. Even though it has been an essential approach for reinforcement learning problems, there is little theoretical understanding of its performance. In this article, we focus on the risk-constrained linear quadratic regulator problem via the PO approach, which requires addressing a challenging nonconvex constrained optimization problem. To solve it, we first build on our earlier result that an optimal policy has a time-invariant affine structure to show that the associated Lagrangian function is coercive, locally gradient dominated, and has a local Lipschitz continuous gradient, based on which we establish strong duality. Then, we design policy gradient primal-dual methods with global convergence guarantees in both model-based and sample-based settings. Finally, we use samples of system trajectories in simulations to validate our methods.
引用
收藏
页码:2934 / 2949
页数:16
相关论文
共 48 条
[1]  
Achiam J, 2017, PR MACH LEARN RES, V70
[2]  
Altman E., 1999, Constrained Markov Decision Processes, V7
[3]  
Bertsekas D., 2012, Dynamic programming and optimal control, VI
[4]  
Bertsekas D, 2016, NONLINEAR PROGRAMMIN, V4
[6]   Risk-Constrained Markov Decision Processes [J].
Borkar, Vivek ;
Jain, Rahul .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (09) :2574-2579
[7]   An actor-critic algorithm for constrained Markov decision processes [J].
Borkar, VS .
SYSTEMS & CONTROL LETTERS, 2005, 54 (03) :207-213
[8]  
Boyd SP., 2004, Convex optimization, DOI 10.1017/CBO9780511804441
[9]  
Bu JJ, 2019, Arxiv, DOI arXiv:1907.08921
[10]   A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems [J].
Chapman, Margaret P. ;
Lacotte, Jonathan ;
Tamar, Aviv ;
Lee, Donggun ;
Smith, Kevin M. ;
Cheng, Victoria ;
Fisac, Jaime F. ;
Jha, Susmit ;
Pavone, Marco ;
Tomlin, Claire J. .
2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, :2958-2963