Learning variable impedance control

被引:271
作者
Buchli, Jonas [1 ,2 ]
Stulp, Freek [2 ]
Theodorou, Evangelos [2 ]
Schaal, Stefan [2 ]
机构
[1] Italian Inst Technol, Dept Adv Robot, I-16163 Genoa, Italy
[2] Univ So Calif, Computat Learning & Motor Control Lab, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
Reinforcement learning; variable impedance control; gain scheduling; motion primitives; compliant control; stochastic optimal control; MANIPULATION;
D O I
10.1177/0278364911402527
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling-based learning method derived from first principles of stochastic optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on the cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory and the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.
引用
收藏
页码:820 / 833
页数:14
相关论文
共 41 条
[1]  
[Anonymous], 2000, DYNAMICS MODELLING C
[2]  
Basar Tamer, 1995, H-infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach
[3]  
BUCHLI J, 2010, P ROB SCI SYST 2010
[4]  
BUCHLI J, 2009, IEEE RSJ INT C INT R, P814
[5]   Stability and motor adaptation in human arm movements [J].
Burdet, E ;
Tee, KP ;
Mareels, I ;
Milner, TE ;
Chew, CM ;
Franklin, DW ;
Osu, R ;
Kawato, M .
BIOLOGICAL CYBERNETICS, 2006, 94 (01) :20-32
[6]  
CAFLISCH RE, ACTA NUMERICA, V7, P1
[7]   CB: a humanoid research platform for exploring neuroscience [J].
Cheng, Gordon ;
Hyon, Sang-Ho ;
Morimoto, Jun ;
Ude, Ales ;
Hale, Joshua G. ;
Colvin, Glenn ;
Scroggin, Wayco ;
Jacobsen, Stephen C. .
ADVANCED ROBOTICS, 2007, 21 (10) :1097-1114
[8]  
Fletcher R., 1981, Practical methods of optimization, volume 2, Constrained Optimization, V2
[9]  
HOGAN, 2006, ADV ROBOT CONTROL
[10]   IMPEDANCE CONTROL - AN APPROACH TO MANIPULATION .1. THEORY [J].
HOGAN, N .
JOURNAL OF DYNAMIC SYSTEMS MEASUREMENT AND CONTROL-TRANSACTIONS OF THE ASME, 1985, 107 (01) :1-7