Human Feedback as Action Assignment in Interactive Reinforcement Learning

被引：6

作者：

Raza, Syed Ali ^{[1
]}

Williams, Mary-Anne ^{[1
]}

机构：

[1] Univ Technol Sydney, Fac Engn & Informat Technol, 81 Broadway, Ultimo, NSW 2007, Australia

来源：

ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS | 2020年 / 14卷 / 04期

基金：

澳大利亚研究理事会;

关键词：

Interactive machine learning; reinforcement learning; reward shaping; learning from human teachers; ROBOT;

D O I：

10.1145/3404197

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner's performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users' ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent's behaviour directly.

引用

页数：24

共 58 条

[1]

Agostini Alejandro, 2015, ARTIF INTELL, V247, P187

[2]

[Anonymous], 2018, CORR

[3]

[Anonymous], 2009, Adaptive and emergent behaviour and complex systems

[4]

[Anonymous], 1998, Introduction to Reinforcement Learning, DOI DOI 10.5555/551283

[5] General Self-Motivation and Strategy Identification: Case Studies Based on Sokoban and Pac-Man [J].

Anthony, Tom ;

Polani, Daniel ;

Nehaniv, Chrystopher L. .

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2014, 6 (01) :1-17

[6]

Arakawa Riku, 2018, CORR

[7] Learning Robot Motion Control with Demonstration and Advice-Operators [J].

Argall, Brenna D. ;

Browning, Brett ;

Veloso, Manuela .

2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, :399-404

[8] A survey of robot learning from demonstration [J].

Argall, Brenna D. ;

Chernova, Sonia ;

Veloso, Manuela ;

Browning, Brett .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) :469-483

[9]

Barlier M, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P999

[10] Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots [J].

Bartneck, Christoph ;

Kulic, Dana ;

Croft, Elizabeth ;

Zoghbi, Susana .

INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2009, 1 (01) :71-81

← 1 2 3 4 5 6 →