Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning

被引:3
作者
Schneider, Lukas [1 ]
Frey, Jonas [1 ,2 ]
Miki, Takahiro [1 ]
Hutter, Marco [1 ]
机构
[1] Swiss Fed Inst Technol, Robot Syst Lab, Zurich, Switzerland
[2] Max Planck Inst Intelligent Syst, Tubingen, Germany
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024) | 2024年
基金
瑞士国家科学基金会;
关键词
D O I
10.1109/ICRA57147.2024.10610137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deployment in hazardous environments requires robots to understand the risks associated with their actions and movements to prevent accidents. Despite its importance, these risks are not explicitly modeled by currently deployed locomotion controllers for legged robots. In this work, we propose a risk sensitive locomotion training method employing distributional reinforcement learning to consider safety explicitly. Instead of relying on a value expectation, we estimate the complete value distribution to account for uncertainty in the robot's interaction with the environment. The value distribution is consumed by a risk metric to extract risk sensitive value estimates. These are integrated into Proximal Policy Optimization (PPO) to derive our method, Distributional Proximal Policy Optimization (DPPO). The risk preference, ranging from risk-averse to risk-seeking, can be controlled by a single parameter, which enables to adjust the robot's behavior dynamically. Importantly, our approach removes the need for additional reward function tuning to achieve risk sensitivity. We show emergent risk sensitive locomotion behavior in simulation and on the quadrupedal robot ANYmal. Videos of the experiments and code are available at https://sites.google.com/leggedrobotics.com/risk-aware-locomotion.
引用
收藏
页码:11451 / 11458
页数:8
相关论文
共 54 条
[1]  
Agarwal A, 2022, PR MACH LEARN RES, V205, P403
[2]  
Barth-Maron G., 2018, 6 INT C LEARNING RE
[3]   Autonomous navigation of stratospheric balloons using reinforcement learning [J].
Bellemare, Marc G. ;
Candido, Salvatore ;
Castro, Pablo Samuel ;
Gong, Jun ;
Machado, Marlos C. ;
Moitra, Subhodeep ;
Ponda, Sameera S. ;
Wang, Ziyu .
NATURE, 2020, 588 (7836) :77-+
[4]  
Bellemare MG, 2017, PR MACH LEARN RES, V70
[5]  
Bellemare MarcG., 2017, arXiv
[6]  
Bernhard J, 2019, IEEE INT VEH SYM, P2148, DOI [10.1109/ivs.2019.8813791, 10.1109/IVS.2019.8813791]
[7]  
Bledt G, 2018, IEEE INT C INT ROBOT, P2245, DOI 10.1109/IROS.2018.8593885
[8]  
Bodnar C, 2020, ROBOTICS: SCIENCE AND SYSTEMS XVI
[9]   Autonomous Spot: Long-Range Autonomous Exploration of Extreme Environments with Legged Locomotion [J].
Bouman, Amanda ;
Ginting, Muhammad Fadhil ;
Alatur, Nikhilesh ;
Palieri, Matteo ;
Fan, David D. ;
Touma, Thomas ;
Pailevanian, Torkom ;
Kim, Sung-Kyun ;
Otsu, Kyohei ;
Burdick, Joel ;
Agha-Mohammadi, Ali-akbar .
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, :2518-2525
[10]  
Dabney W, 2018, PR MACH LEARN RES, V80