Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations

被引：0

作者：

Lee, Jonathan ^{[1
]}

Laskey, Michael ^{[1
]}

Fox, Roy ^{[1
]}

Goldberg, Ken ^{[1
,2
]}

机构：

[1] Univ Calif Berkeley, AUTOLAB, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, AUTOLAB, Dept Ind Engn & Operat Res, Berkeley, CA 94720 USA

来源：

2018 IEEE 14TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE) | 2018年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning from human demonstrations can facilitate automation but is risky because the execution of the learned policy might lead to collisions and other failures. Adding explicit constraints to avoid unsafe states is generally not possible when the state representations are complex. Furthermore, enforcing these constraints during execution of the learned policy can be challenging in environments where dynamics are difficult to model such as push mechanics in grasping. In this paper, we propose Derivative-Free Recovery (DFR), a two-phase method for generating robust policies from demonstrations in robotic manipulation tasks where the system comes to rest at each time step. In the first phase, we use support estimation of supervisor demonstrations and treat the support as implicit constraints on states. We also propose a time-varying modification for sequential tasks. In the second phase, we use this support estimate to derive a switching policy that employs the learned policy in the interior of the support and switches to a recovery policy to steer the robot away from the boundary of the support if it drifts too close. We present additional conditions, which linearly bound the difference in state at each time step by the magnitude of control, allowing us to prove that the robot will not violate the constraints using the recovery policy. A simulated pushing task in MuJoCo suggests that DFR can reduce collisions by 83%. On a physical line tracking task using a da Vinci Surgical Robot and a moving Stewart platform, DFR reduced collisions by 84%.

引用

页码：270 / 277

页数：8

共 35 条

[1] Achiam J, 2017, PR MACH LEARN RES, V70
[2] Akametalu A. K., 2014, IEEE C DEC CONTR CDC
[3] [Anonymous], 2017, C ROB LEARN
[4] [Anonymous], 2017, CORR
[5] Armesto L, 2017, ROBOTICS: SCIENCE AND SYSTEMS XIII
[6] Billard A., 2008, Springer Handbook of Robotics, P1371, DOI 10.1007/978-3-540-30301-560
[7] Calinon S., 2009, Robot programming by demonstration-a probabilistic approach, robot programming by demonstration-a probabilistic approach
[8] Calinon S., 2008, IEEE INT C INT ROB S
[9] Chen C., 2017, INT C AUT SCI ENG CA
[10] Coates A., 2008, INT C MACH LEARN ICM

← 1 2 3 4 →