Objective learning from human demonstrations

被引:6
|
作者
Lin, Jonathan Feng-Shun [1 ]
Carreno-Medrano, Pamela [2 ]
Parsapour, Mahsa [3 ]
Sakr, Maram [2 ,4 ]
Kulic, Dana [2 ]
机构
[1] Univ Waterloo, Syst Design Engn, Waterloo, ON, Canada
[2] Monash Univ, Fac Engn, Clayton, Vic, Australia
[3] Univ Waterloo, Elect & Comp Engn, Waterloo, ON, Canada
[4] Univ British Columbia, Mech Engn, Vancouver, BC, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Reward learning; Inverse optimal control; Inverse reinforcement learning; INVERSE OPTIMAL-CONTROL; COST-FUNCTIONS; GENERATION; ROBOT;
D O I
10.1016/j.arcontrol.2021.04.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Researchers in biomechanics, neuroscience, human-machine interaction and other fields are interested in inferring human intentions and objectives from observed actions. The problem of inferring objectives from observations has received extensive theoretical and methodological development from both the controls and machine learning communities. In this paper, we provide an integrating view of objective learning from human demonstration data. We differentiate algorithms based on the assumptions made about the objective function structure, how the similarity between the inferred objectives and the observed demonstrations is assessed, the assumptions made about the agent and environment model, and the properties of the observed human demonstrations. We review the application domains and validation approaches of existing works and identify the key open challenges and limitations. The paper concludes with an identification of promising directions for future work.
引用
收藏
页码:111 / 129
页数:19
相关论文
共 50 条
  • [21] Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement
    Ezzeddine, Ali
    Mourad, Nafee
    Araabi, Babak Nadjar
    Ahmadabadi, Majid Nili
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 112 : 331 - 341
  • [22] An Unified Approach to Inverse Reinforcement Learning by Oppositive Demonstrations
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Tseng, Yi-Chia
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2016, : 1664 - 1668
  • [23] Label-Free Adaptive Gaussian Sample Consensus Framework for Learning From Perfect and Imperfect Demonstrations
    Hu, Yi
    Samadikhoshkho, Zahra
    Jin, Jun
    Tavakoli, Mahdi
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1093 - 1103
  • [24] Trajectory Learning by Therapists' Demonstrations for an Upper Limb Rehabilitation Exoskeleton
    Luciani, Beatrice
    Roveda, Loris
    Braghin, Francesco
    Pedrocchi, Alessandra
    Gandolla, Marta
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 4561 - 4568
  • [25] Batch Active Learning of Reward Functions from Human Preferences
    Biyik, Erdem
    Anari, Nima
    Sadigh, Dorsa
    ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (02)
  • [26] A Novel Teacher-Assistance-Based Method to Detect and Handle Bad Training Demonstrations in Learning From Demonstration
    Li, Qin
    Wang, Yong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (03) : 948 - 956
  • [27] Preliminary experiments in motion programming of humanoid robot by human demonstrations
    Konno, A
    Yoshiike, T
    Nagashima, K
    Inaba, M
    Inoue, H
    JSME INTERNATIONAL JOURNAL SERIES C-MECHANICAL SYSTEMS MACHINE ELEMENTS AND MANUFACTURING, 2000, 43 (02) : 401 - 407
  • [28] Learning From Human Directional Corrections
    Jin, Wanxin
    Murphey, Todd D.
    Lu, Zehui
    Mou, Shaoshuai
    IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 625 - 644
  • [29] Individual Human Behavior Identification Using an Inverse Reinforcement Learning Method
    Inga, Jairo
    Koepf, Florian
    Flad, Michael
    Hohmann, Soeren
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 99 - 104
  • [30] Learning from Approximate Human Decisions by a Robot
    Jayawardena, Chandimal
    Watanabe, Keigo
    Izumi, Kiyotaka
    JOURNAL OF ROBOTICS AND MECHATRONICS, 2007, 19 (01) : 68 - 76