Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning

被引:0
|
作者
Ma, Haitong [1 ,2 ]
Liu, Changliu [3 ]
Li, Shengbo Eben [1 ,2 ]
Zheng, Sifa [1 ,2 ]
Sun, Wenchao [1 ,2 ]
Chen, Jianyu [4 ,5 ]
机构
[1] Tsinghua Univ, State Key Lab Automot Safety & Energy, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Ctr Intelligent Connected Vehicles & Transportat, Beijing 100084, Peoples R China
[3] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[4] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
[5] Shanghai Qizhi Inst, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金;
关键词
Constrained reinforcement learning (RL); safe RL; safety index; zero-violation policy; OPTIMIZATION;
D O I
10.1109/TNNLS.2023.3348422
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence. To handle this problem, we leverage the safety-oriented energy functions to learn zero-constraint-violation safe policies and propose the safe set actor-critic (SSAC) algorithm. The energy function is designed to increase rapidly for potentially dangerous actions, locating the safe set on the action space. Therefore, we can identify the dangerous actions prior to taking them and achieve zero-constraint violation. Our major contributions are twofold. First, we use the data-driven methods to learn the energy function, which releases the requirement of known dynamics. Second, we formulate a constrained RL problem to solve the zero-violation policies. We prove that our Lagrangian-based constrained RL solutions converge to the constrained optimal zero-violation policies theoretically. The proposed algorithm is evaluated on the complex simulation environments and a hardware-in-loop (HIL) experiment with a real autonomous vehicle controller. Experimental results suggest that the converged policies in all environments achieve zero-constraint violation and comparable performance with model-based baseline.
引用
收藏
页码:2327 / 2341
页数:15
相关论文
共 50 条
  • [1] Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation
    Wei, Honghao
    Liu, Xin
    Ying, Lei
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [2] Constrained Dirichlet Distribution Policy: Guarantee Zero Constraint Violation Reinforcement Learning for Continuous Robotic Control
    Ma, Jianming
    Cao, Zhanxiang
    Gao, Yue
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11690 - 11697
  • [3] Model-free Safe Reinforcement Learning Method Based on Constrained Markov Decision Processes
    Zhu F.
    Ge Y.-Y.
    Ling X.-H.
    Liu Q.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (08): : 3086 - 3102
  • [4] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [5] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [6] Safe Reinforcement Learning via a Model-Free Safety Certifier
    Modares, Amir
    Sadati, Nasser
    Esmaeili, Babak
    Yaghmaie, Farnaz Adib
    Modares, Hamidreza
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3302 - 3311
  • [7] Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
    Bai, Qinbo
    Bedi, Amrit Singh
    Agarwal, Mridul
    Koppel, Alec
    Aggarwal, Vaneet
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3682 - 3689
  • [8] Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
    Bai, Qinbo
    Bedi, Armit Singh
    Aggarwal, Vaneet
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6737 - 6744
  • [9] Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs
    Liu, Tao
    Zhou, Ruida
    Kalathil, Dileep
    Kumar, P. R.
    Tian, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising
    Wu, Di
    Chen, Xiujun
    Yang, Xun
    Wang, Hao
    Tan, Qing
    Zhang, Xiaoxun
    Xu, Jian
    Gai, Kun
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1443 - 1451