Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning

被引：0

作者：

Ma, Haitong ^{[1
,2
]}

Liu, Changliu ^{[3
]}

Li, Shengbo Eben ^{[1
,2
]}

Zheng, Sifa ^{[1
,2
]}

Sun, Wenchao ^{[1
,2
]}

Chen, Jianyu ^{[4
,5
]}

机构：

[1] Tsinghua Univ, State Key Lab Automot Safety & Energy, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Ctr Intelligent Connected Vehicles & Transportat, Beijing 100084, Peoples R China

[3] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA

[4] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China

[5] Shanghai Qizhi Inst, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Constrained reinforcement learning (RL); safe RL; safety index; zero-violation policy; OPTIMIZATION;

D O I：

10.1109/TNNLS.2023.3348422

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence. To handle this problem, we leverage the safety-oriented energy functions to learn zero-constraint-violation safe policies and propose the safe set actor-critic (SSAC) algorithm. The energy function is designed to increase rapidly for potentially dangerous actions, locating the safe set on the action space. Therefore, we can identify the dangerous actions prior to taking them and achieve zero-constraint violation. Our major contributions are twofold. First, we use the data-driven methods to learn the energy function, which releases the requirement of known dynamics. Second, we formulate a constrained RL problem to solve the zero-violation policies. We prove that our Lagrangian-based constrained RL solutions converge to the constrained optimal zero-violation policies theoretically. The proposed algorithm is evaluated on the complex simulation environments and a hardware-in-loop (HIL) experiment with a real autonomous vehicle controller. Experimental results suggest that the converged policies in all environments achieve zero-constraint violation and comparable performance with model-based baseline.

引用

页码：2327 / 2341

页数：15

共 50 条

[1] Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
Wei, Honghao
Liu, Xin
Ying, Lei
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[2] Constrained Dirichlet Distribution Policy: Guarantee Zero Constraint Violation Reinforcement Learning for Continuous Robotic Control
Ma, Jianming
Cao, Zhanxiang
Gao, Yue
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11690 - 11697
[3] Model-free Safe Reinforcement Learning Method Based on Constrained Markov Decision Processes
Zhu F.
Ge Y.-Y.
Ling X.-H.
Liu Q.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (08): : 3086 - 3102
[4] Constrained model-free reinforcement learning for process optimization
Pan, Elton
Petsagkourakis, Panagiotis
Mowbray, Max
Zhang, Dongda
del Rio-Chanona, Ehecatl Antonio
COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
[5] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
Liu, Yongshuai
Halev, Avishai
Liu, Xin
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
[6] Safe Reinforcement Learning via a Model-Free Safety Certifier
Modares, Amir
Sadati, Nasser
Esmaeili, Babak
Yaghmaie, Farnaz Adib
Modares, Hamidreza
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3302 - 3311
[7] Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Bai, Qinbo
Bedi, Amrit Singh
Agarwal, Mridul
Koppel, Alec
Aggarwal, Vaneet
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3682 - 3689
[8] Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Bai, Qinbo
Bedi, Armit Singh
Aggarwal, Vaneet
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6737 - 6744
[9] Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs
Liu, Tao
Zhou, Ruida
Kalathil, Dileep
Kumar, P. R.
Tian, Chao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising
Wu, Di
Chen, Xiujun
Yang, Xun
Wang, Hao
Tan, Qing
Zhang, Xiaoxun
Xu, Jian
Gai, Kun
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1443 - 1451

← 1 2 3 4 5 →