Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition

被引：88

作者：

Siekmann, Jonah ^{[1
]}

Godse, Yesh ^{[1
]}

Fern, Alan ^{[1
]}

Hurst, Jonathan ^{[1
]}

机构：

[1] Oregon State Univ, Collaborat Robot & Intelligent Syst Inst, Corvallis, OR 97331 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021) | 2021年

关键词：

D O I：

10.1109/ICRA48506.2021.9561814

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of realizing the full spectrum of bipedal locomotion on a real robot with sim-to-real reinforcement learning (RL). A key challenge of learning legged locomotion is describing different gaits, via reward functions, in a way that is intuitive for the designer and specific enough to reliably learn the gait across different initial random seeds or hyperparameters. A common approach is to use reference motions (e.g. trajectories of joint positions) to guide learning. However, finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion. At the other extreme, reference-free reward functions are often underspecified (e.g. move forward) leading to massive variance in policy behavior, or are the product of significant reward-shaping via trial-and-error, making them exclusive to specific gaits. In this work, we propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities. We instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping. Using this function we demonstrate successful sim-to-real transfer of the learned gaits to the bipedal robot Cassie, as well as a generic policy that can transition between all of the two-beat gaits.

引用

页码：7309 / 7315

页数：7

共 28 条

[1]

Adbolhosseini Farzad, 2019, P ACM SIGGRAPH MOT I

[2]

Agility Robotics, 2018, ONLINE

[3]

Agrawal S., 2013, S COMP ANT

[4]

[Anonymous], 2012, ACM Transactions on Graphics (TOG)

[5] DReCon: Data-Driven Responsive Control of Physics-Based Characters [J].

Bergamin, Kevin ;

Clavet, Simon ;

Holden, Daniel ;

Forbes, James Richard .

ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (06)

[6] Terrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning [J].

Bin Peng, Xue ;

Berseth, Glen ;

van de Panne, Michiel .

ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04)

[7]

Brockman Greg, 2016, arXiv

[8]

Gan Z., 2018, J ROYAL SOC INTERFAC, V15

[9]

Geijtenbeek T., INTERACTIVE CHARACTE

[10]

Haarnoja T., 2018, Learning to walk via deep reinforcement learning

← 1 2 3 →