Safety Probability Estimation in Dimension Reduction Space for Model-Free Safe Reinforcement Learning of Robotics

被引:0
作者
Yu, Jianlan [1 ]
Liu, Qingchen [1 ]
Qin, Jiahu [1 ]
Han, Ruitian [1 ]
Yan, Chengzhen [1 ]
机构
[1] Univ Sci & Technol China, Dept Automation, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Safety; Robots; Training; Switches; Reinforcement learning; Probabilistic logic; Estimation; Computational modeling; Dimensionality reduction; Quadrupedal robots; Machine learning for robot control; robot safety; reinforcement learning (RL); FRAMEWORK;
D O I
10.1109/LRA.2025.3566256
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Reinforcement learning (RL) for robotics poses significant consideration of safety during training. However, a major challenge of safe reinforcement learning (SRL) methods is the curse of dimensionality. Although existing SRL methods utilizing dimensionality reduction (DR) approaches could provide safety probability estimation of a high-dimensional state according to a low dimensional safe region, their core safety estimators are not comprehensive. In this letter, we develop a novel, purely data-driven safety probability estimator which considers both the uncertainty of information loss caused by DR and the uncertainty of insufficient data caused by data-driven methods. This estimator does not need manual selection of parameters and the estimation is rather accurate even with a small amount of data. We theoretically prove that the estimator converges to the true safety probability. Existing RL algorithms (PPO, SAC) using this estimator can directly train control policies in real physical robots with a significant enhancement of training safety. The effectiveness of the algorithm are verified by conducting experiments for a quadruped robot in both practical and simulation environments, where a 34D-to-2D safety estimator is implemented to guarantee a 83% success rate of safe control policy.
引用
收藏
页码:6312 / 6319
页数:8
相关论文
共 24 条
[1]   A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets [J].
Akametalu, Anayo K. ;
Ghosh, Shromona ;
Fisac, Jaime F. ;
Rubies-Royo, Vicenc ;
Tomlin, Claire J. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (02) :1097-1103
[2]  
Akkaya I, 2019, Arxiv, DOI arXiv:1910.07113
[3]  
Bansal S., 2017, IEEE C DECISION CONT
[4]  
Berkenkamp F, 2017, ADV NEUR IN, V30
[5]   Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning [J].
Brunke, Lukas ;
Greeff, Melissa ;
Hall, Adam W. ;
Yuan, Zhaocong ;
Zhou, Siqi ;
Panerati, Jacopo ;
Schoellig, Angela P. .
ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 :411-444
[6]  
Cheng R, 2019, AAAI CONF ARTIF INTE, P3387
[7]   A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems [J].
Fisac, Jaime F. ;
Akametalu, Anayo K. ;
Zeilinger, Melanie N. ;
Kaynama, Shahab ;
Gillula, Jeremy ;
Tomlin, Claire J. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) :2737-2752
[8]   Data-Driven Safety-Critical Control: Synthesizing Control Barrier Functions With Koopman Operators [J].
Folkestad, Carl ;
Chen, Yuxiao ;
Ames, Aaron D. ;
Burdick, Joel W. .
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (06) :2012-2017
[9]  
García J, 2015, J MACH LEARN RES, V16, P1437
[10]   Scalable Learning of Safety Guarantees for Autonomous Systems using Hamilton-Jacobi Reachability [J].
Herbert, Sylvia ;
Choi, Jason J. ;
Sanjeev, Suvansh ;
Gibson, Marsalis ;
Sreenath, Koushil ;
Tomlin, Claire J. .
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :5914-5920