Safe Policies for Reinforcement Learning via Primal-Dual Methods

被引：24

作者：

Paternain, Santiago ^{[1
]}

Calvo-Fullana, Miguel ^{[2
]}

Chamon, Luiz F. O. ^{[3
]}

Ribeiro, Alejandro ^{[4
]}

机构：

[1] Renssealaer Polytech Inst, Elect Comp & Syst Engn, Troy, NY 12180 USA

[2] MIT, Dept Aeronaut & Astronaut, Cambridge, MA 02139 USA

[3] Univ Calif Berkeley, Berkeley, CA 94551 USA

[4] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 03期

关键词：

Safety; Trajectory; Reinforcement learning; Task analysis; Optimal control; Optimization; Markov processes; Autonomous systems; gradient methods; unsupervised learning; APPROXIMATION;

D O I：

10.1109/TAC.2022.3152724

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we study the design of controllers in the context of stochastic optimal control under the assumption that the model of the system is not available. This is, we aim to control a Markov decision process of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. The drawbacks of this formulation are twofold. The problem is nonconvex and computing the gradients of the constraints with respect to the policies is prohibitive. Hence, we propose an ergodic relaxation of the constraints with the following advantages. 1) The safety guarantees are maintained in the case of episodic tasks and they hold until a given time horizon for continuing tasks. 2) The constrained optimization problem despite its nonconvexity has arbitrarily small duality gap if the parametrization of the controller is rich enough. 3) The gradients of the Lagrangian associated with the safe learning problem can be computed using standard reinforcement learning results and stochastic approximation tools. Leveraging these advantages, we exploit primal-dual algorithms to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.

引用

页码：1321 / 1336

页数：16

共 50 条

[1] Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Bai, Qinbo
Bedi, Amrit Singh
Agarwal, Mridul
Koppel, Alec
Aggarwal, Vaneet
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3682 - 3689
[2] Real-Time Optimal Power Flow Method via Safe Deep Reinforcement Learning Based on Primal-Dual and Prior Knowledge Guidance
Wu, Pengfei
Chen, Chen
Lai, Dexiang
Zhong, Jian
Bie, Zhaohong
IEEE TRANSACTIONS ON POWER SYSTEMS, 2025, 40 (01) : 597 - 611
[3] Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD
Lee, Donghwan
Yoon, Hyungjin
Hovakimyan, Naira
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 1967 - 1972
[4] A projected primal-dual gradient optimal control method for deep reinforcement learning
Gottschalk, Simon
Burger, Michael
Gerdts, Matthias
JOURNAL OF MATHEMATICS IN INDUSTRY, 2020, 10 (01)
[5] A projected primal-dual gradient optimal control method for deep reinforcement learning
Simon Gottschalk
Michael Burger
Matthias Gerdts
Journal of Mathematics in Industry, 10
[6] Achieving Zero Constraint Violation for Concave Utility Constrained Reinforcement Learning via Primal-Dual Approach
Bai, Qinbo
Bedi, Amrit Singh
Agarwal, Mridul
Koppel, Alec
Aggarwal, Vaneet
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2023, 78 : 975 - 1016
[7] Global Convergence of Policy Gradient Primal-Dual Methods for Risk-Constrained LQRs
Zhao, Feiran
You, Keyou
Basar, Tamer
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 2934 - 2949
[8] Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method
Lee, Donghwan
Kim, Do Wan
Hu, Jianghai
IEEE ACCESS, 2022, 10 : 107077 - 107094
[9] Primal-Dual Deep Reinforcement Learning for Periodic Coverage-Assisted UAV Secure Communications
Qin, Yunhui
Xing, Zhifang
Li, Xulong
Zhang, Zhongshan
Zhang, Haijun
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (12) : 19641 - 19652
[10] Safe Reinforcement Learning via Episodic Control
Li, Zhuo
Zhu, Derui
Grossklags, Jens
IEEE ACCESS, 2025, 13 : 35270 - 35280

← 1 2 3 4 5 →