Learning Robot Motion Control with Demonstration and Advice-Operators

被引:37
作者
Argall, Brenna D. [1 ]
Browning, Brett [1 ]
Veloso, Manuela [2 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
来源
2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS | 2008年
关键词
D O I
10.1109/IROS.2008.4651020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As robots become more commonplace within society, the need for tools to enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LID) offers one promising approach, where the robot learns a policy from teacher task executions. Our interests lie with robot motion control policies which map world observations to continuous low-level actions. In this work, we introduce Advice-Operator Policy Improvement (A-OPI) as a novel approach for improving policies within LfD. Two distinguishing characteristics of the A-OPI algorithm are data source and continuous state-action space. Within LfD, more example data can improve a policy. In A-OPI, new data is synthesized from a student execution and teacher advice. By contrast, typical demonstration approaches provide the learner with exclusively teacher executions. A-OPI is effective within continuous state-action spaces because high level human advice is translated into continuous-valued corrections on the student execution. This work presents a first implementation of the A-OPI algorithm, validated on a Segway RMP robot performing a spatial positioning task. A-OPI is found to improve task performance, both in success and accuracy. Furthermore, performance is shown to be similar or superior to the typical exclusively teacher demonstrations approach.
引用
收藏
页码:399 / 404
页数:6
相关论文
共 11 条
[1]  
ABBEEL P, 2005, P ICML 05
[2]  
[Anonymous], 2001, SOLVING UNCERTAIN MA
[3]  
Atkeson CG, 1997, ARTIF INTELL REV, V11, P75, DOI 10.1023/A:1006511328852
[4]  
ATKESON CG, 1997, P ICML 97
[5]  
BENTIVEGNA DC, 2004, THESIS GEORGIA I TEC
[6]  
CALINON S, 2007, P HRI 07
[7]  
CHERNOVA S, 2007, P AAMAS 07
[8]  
GROLLMAN DH, 2007, P ICRA 07
[9]  
IJSPEERT A, 2002, P ICRA 02
[10]  
NICOLESCU MN, 2003, P AAMAS 03