Recent advances in leveraging human guidance for sequential decision-making tasks

被引：7

作者：

Zhang, Ruohan ^{[1
]}

Torabi, Faraz ^{[1
]}

Warnell, Garrett ^{[2
]}

Stone, Peter ^{[1
,3
]}

机构：

[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA

[2] US Army, Res Lab, Adelphi, MD USA

[3] Sony AI, Austin, TX USA

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2021年 / 35卷 / 02期

基金：

美国国家科学基金会;

关键词：

Learning from demonstration; Imitation learning; Reinforcement learning; Human feedback; Hierarchical learning; Imitation from observation; Attention; ARCADE LEARNING-ENVIRONMENT; TRAJECTORY-TRACKING; REINFORCEMENT; GAZE; IMITATION; LEVEL; GO; BEHAVIORS; FRAMEWORK; ROBOTICS;

D O I：

10.1007/s10458-021-09514-w

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making. Importantly, while it is the artificial agent that learns and acts, it is still up to humans to specify the particular task to be performed. Classical task-specification approaches typically involve humans providing stationary reward functions or explicit demonstrations of the desired tasks. However, there has recently been a great deal of research energy invested in exploring alternative ways in which humans may guide learning agents that may, e.g., be more suitable for certain tasks or require less human effort. This survey provides a high-level overview of five recent machine learning frameworks that primarily rely on human guidance apart from pre-specified reward functions or conventional, step-by-step action demonstrations. We review the motivation, assumptions, and implementation of each framework, and we discuss possible future research directions.

引用

页数：39

共 222 条

[1] Abbeel P., 2004, P 21 INT C MACH LEAR, P1, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
[2] Autonomous Helicopter Aerobatics through Apprenticeship Learning
Abbeel, Pieter
Coates, Adam
Ng, Andrew Y.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) : 1608 - 1639
[3] Abel D., 2017, NEURIPS WORKSH FUT I
[4] Trajectory-tracking and path-following of underactuated autonomous vehicles with parametric modeling uncertainty
Aguiar, A. Pedro
Hespanha, Joao P.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (08) : 1362 - 1379
[5] Akinola I, 2020, IEEE INT CONF ROBOT, P3799, DOI [10.1109/icra40945.2020.9196566, 10.1109/ICRA40945.2020.9196566]
[6] Akrour R, 2014, PR MACH LEARN RES, V32, P1503
[7] Amir O., 2016, the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI, P804
[8] Andreas J, 2017, PR MACH LEARN RES, V70
[9] [Anonymous], 2010, ICML
[10] [Anonymous], 2016, P 25 INT JOINT C ART

← 1 2 3 4 5 6 7 8 9 10 →