Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning

被引:1
|
作者
Yifru, Lunet [1 ]
Baheri, Ali [2 ]
机构
[1] West Virginia Univ, Morgantown, WV 26505 USA
[2] Rochester Inst Technol, Rochester, NY 14623 USA
来源
关键词
STL mining; safe learning; specification-guided reinforcement learning (RL); SIGNAL TEMPORAL LOGIC; BAYESIAN OPTIMIZATION;
D O I
10.1109/OJCSYS.2024.3418306
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints.
引用
收藏
页码:266 / 281
页数:16
相关论文
共 50 条
  • [1] On-policy concurrent reinforcement learning
    Banerjee, B
    Sen, S
    Peng, J
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2004, 16 (04) : 245 - 260
  • [2] Safety reinforcement learning control via transfer learning
    Zhang, Quanqi
    Wu, Chengwei
    Tian, Haoyu
    Gao, Yabin
    Yao, Weiran
    Wu, Ligang
    AUTOMATICA, 2024, 166
  • [3] A reinforcement learning approach for robot control in an unknown environment
    Xiao, NF
    Nahavandi, S
    IEEE ICIT' 02: 2002 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS I AND II, PROCEEDINGS, 2002, : 1096 - 1099
  • [4] Reinforcement Learning Control for a Robotic Manipulator with Unknown Deadzone
    Li, Yanan
    Xiao, Shengtao
    Ge, Shuzhi Sam
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 593 - 598
  • [5] REINFORCEMENT LEARNING CONTROL OF UNKNOWN DYNAMIC-SYSTEMS
    WU, QH
    PUGH, AC
    IEE PROCEEDINGS-D CONTROL THEORY AND APPLICATIONS, 1993, 140 (05): : 313 - 322
  • [6] Bridging Reinforcement Learning and Iterative Learning Control: Autonomous Motion Learning for Unknown, Nonlinear Dynamics
    Meindl, Michael
    Lehmann, Dustin
    Seel, Thomas
    FRONTIERS IN ROBOTICS AND AI, 2022, 9
  • [7] Integrating Classical Control into Reinforcement Learning Policy
    Huang, Ye
    Gu, Chaochen
    Guan, Xinping
    NEURAL PROCESSING LETTERS, 2021, 53 (03) : 1709 - 1722
  • [8] Integrating Classical Control into Reinforcement Learning Policy
    Ye Huang
    Chaochen Gu
    Xinping Guan
    Neural Processing Letters, 2021, 53 : 1709 - 1722
  • [9] Policy Poisoning in Batch Reinforcement Learning and Control
    Ma, Yuzhe
    Zhang, Xuezhou
    Sun, Wen
    Zhu, Xiaojin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Towards Safe Reinforcement Learning with a Safety Editor Policy
    Yu, Haonan
    Xu, Wei
    Zhang, Haichao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,