Terrain-Aware Risk-Assessment-Network-Aided Deep Reinforcement Learning for Quadrupedal Locomotion in Tough Terrain

被引：3

作者：

Zhang, Hongyin ^{[1
,2
]}

Wang, Jilong ^{[1
,2
,3
]}

Wu, Zhengqing ^{[1
,2
]}

Wang, Yinuo ^{[1
,2
]}

Wang, Donglin ^{[1
,2
]}

机构：

[1] Westlake Univ, Sch Engn, Machine Intelligence Lab MiLAB, Hangzhou 310024, Peoples R China

[2] Westlake Inst Adv Study, Inst Adv Technol, Hangzhou 310024, Peoples R China

[3] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA

来源：

2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2021年

关键词：

D O I：

10.1109/IROS51168.2021.9636519

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

When it comes to the control system of quadruped robots, deep reinforcement learning (DRL) is considered to be a promising solution. Despite years of development in this field, difficulties remain in guaranteeing the action stability of DRL-based quadruped robots' locomotion, especially in tough terrain. In this paper, a terrain-aware teacher-student controller integrating a risk assessment network (RAN) is proposed to alleviate this problem. During the training phase, the RAN can evaluate the risk level of historical observation or current state and further guide the update of the policy, thereby assisting the policy in selecting better actions and avoid risky ones. Furthermore, the real-time elevation map is transmitted to the controller as visual information, so that it can perceive the terrain to produce higher performance locomotion. With the aforementioned configuration, we enable a robot to traverse various challenging terrain in simulation and bound or trot stably in the real environment.

引用

页码：4538 / 4545

页数：8

共 39 条

[1]

Achiam Joshua, 2017, Constrained Policy Optimization

[2]

Alshiekh M., 2017, Safe reinforcement learning via shielding

[3] Spring-loaded inverted pendulum goes through two contraction-extension cycles during the single-support phase of walking [J].