Recent advances in leveraging human guidance for sequential decision-making tasks

被引:7
作者
Zhang, Ruohan [1 ]
Torabi, Faraz [1 ]
Warnell, Garrett [2 ]
Stone, Peter [1 ,3 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] US Army, Res Lab, Adelphi, MD USA
[3] Sony AI, Austin, TX USA
基金
美国国家科学基金会;
关键词
Learning from demonstration; Imitation learning; Reinforcement learning; Human feedback; Hierarchical learning; Imitation from observation; Attention; ARCADE LEARNING-ENVIRONMENT; TRAJECTORY-TRACKING; REINFORCEMENT; GAZE; IMITATION; LEVEL; GO; BEHAVIORS; FRAMEWORK; ROBOTICS;
D O I
10.1007/s10458-021-09514-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making. Importantly, while it is the artificial agent that learns and acts, it is still up to humans to specify the particular task to be performed. Classical task-specification approaches typically involve humans providing stationary reward functions or explicit demonstrations of the desired tasks. However, there has recently been a great deal of research energy invested in exploring alternative ways in which humans may guide learning agents that may, e.g., be more suitable for certain tasks or require less human effort. This survey provides a high-level overview of five recent machine learning frameworks that primarily rely on human guidance apart from pre-specified reward functions or conventional, step-by-step action demonstrations. We review the motivation, assumptions, and implementation of each framework, and we discuss possible future research directions.
引用
收藏
页数:39
相关论文
共 222 条
  • [1] Abbeel P., 2004, P 21 INT C MACH LEAR, P1, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
  • [2] Autonomous Helicopter Aerobatics through Apprenticeship Learning
    Abbeel, Pieter
    Coates, Adam
    Ng, Andrew Y.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) : 1608 - 1639
  • [3] Abel D., 2017, NEURIPS WORKSH FUT I
  • [4] Trajectory-tracking and path-following of underactuated autonomous vehicles with parametric modeling uncertainty
    Aguiar, A. Pedro
    Hespanha, Joao P.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (08) : 1362 - 1379
  • [5] Akinola I, 2020, IEEE INT CONF ROBOT, P3799, DOI [10.1109/icra40945.2020.9196566, 10.1109/ICRA40945.2020.9196566]
  • [6] Akrour R, 2014, PR MACH LEARN RES, V32, P1503
  • [7] Amir O., 2016, the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, P804
  • [8] Andreas J, 2017, PR MACH LEARN RES, V70
  • [9] [Anonymous], 2010, ICML
  • [10] [Anonymous], 2016, P 25 INT JOINT C ART