Bayesian Disturbance Injection: Robust imitation learning of flexible policies for robot manipulation

被引:7
作者
Oh, Hanbit [1 ]
Sasaki, Hikaru [1 ]
Michael, Brendan [1 ]
Matsubara, Takamitsu [1 ]
机构
[1] NAIST, Grad Sch Sci & Technol, Div Informat Sci, 8916-5,Takayama Cho, Ikoma City, Nara 6300192, Japan
关键词
Imitation learning; Disturbance injection; Human behavior characteristics; Robotic manipulation; GAUSSIAN-PROCESSES; TASK; SENSITIVITY;
D O I
10.1016/j.neunet.2022.11.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, ap-plicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterization of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:42 / 58
页数:17
相关论文
共 58 条
  • [1] [Anonymous], 2013, INT C MACH LEARN ICM
  • [2] [Anonymous], 1988, P ADV NEUR INF PROC
  • [3] A survey of robot learning from demonstration
    Argall, Brenna D.
    Chernova, Sonia
    Veloso, Manuela
    Browning, Brett
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) : 469 - 483
  • [4] Bain M., 1995, MACH INTELL, V15, P103, DOI DOI 10.5555/647636.733043
  • [5] Billard Aude, 2008, Handbook of robotics, P1371, DOI DOI 10.1007/978-3-540-30301-5_60
  • [6] Bojarski M., 2016, NEURAL INFORM PROCES
  • [7] Brown D. S., 2019, CORL 19 PROC 3 ANN C, P330
  • [8] A tutorial on task-parameterized movement learning and retrieval
    Calinon, Sylvain
    [J]. INTELLIGENT SERVICE ROBOTICS, 2016, 9 (01) : 1 - 29
  • [9] Chen LT, 2020, PR MACH LEARN RES, V155, P1262
  • [10] Asynchronous Fault Detection Observer for 2-D Markov Jump Systems
    Cheng, Peng
    Wang, Hai
    Stojanovic, Vladimir
    He, Shuping
    Shi, Kaibo
    Luan, Xiaoli
    Liu, Fei
    Sun, Changyin
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13623 - 13634