Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition

被引:88
作者
Siekmann, Jonah [1 ]
Godse, Yesh [1 ]
Fern, Alan [1 ]
Hurst, Jonathan [1 ]
机构
[1] Oregon State Univ, Collaborat Robot & Intelligent Syst Inst, Corvallis, OR 97331 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021) | 2021年
关键词
D O I
10.1109/ICRA48506.2021.9561814
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of realizing the full spectrum of bipedal locomotion on a real robot with sim-to-real reinforcement learning (RL). A key challenge of learning legged locomotion is describing different gaits, via reward functions, in a way that is intuitive for the designer and specific enough to reliably learn the gait across different initial random seeds or hyperparameters. A common approach is to use reference motions (e.g. trajectories of joint positions) to guide learning. However, finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion. At the other extreme, reference-free reward functions are often underspecified (e.g. move forward) leading to massive variance in policy behavior, or are the product of significant reward-shaping via trial-and-error, making them exclusive to specific gaits. In this work, we propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities. We instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping. Using this function we demonstrate successful sim-to-real transfer of the learned gaits to the bipedal robot Cassie, as well as a generic policy that can transition between all of the two-beat gaits.
引用
收藏
页码:7309 / 7315
页数:7
相关论文
共 28 条
[1]  
Adbolhosseini Farzad, 2019, P ACM SIGGRAPH MOT I
[2]  
Agility Robotics, 2018, ONLINE
[3]  
Agrawal S., 2013, S COMP ANT
[4]  
[Anonymous], 2012, ACM Transactions on Graphics (TOG)
[5]   DReCon: Data-Driven Responsive Control of Physics-Based Characters [J].
Bergamin, Kevin ;
Clavet, Simon ;
Holden, Daniel ;
Forbes, James Richard .
ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (06)
[6]   Terrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning [J].
Bin Peng, Xue ;
Berseth, Glen ;
van de Panne, Michiel .
ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04)
[7]  
Brockman Greg, 2016, arXiv
[8]  
Gan Z., 2018, J ROYAL SOC INTERFAC, V15
[9]  
Geijtenbeek T., INTERACTIVE CHARACTE
[10]  
Haarnoja T., 2018, Learning to walk via deep reinforcement learning