I2RL: online inverse reinforcement learning under occlusion

被引：0

作者：

Saurabh Arora

Prashant Doshi

Bikramjit Banerjee

机构：

[1] University of Georgia,THINC Lab, Department of Computer Science, 415 Boyd GSRC

[2] University of Southern Mississippi,School of Computing Sciences and Computer Engineering

来源：

Autonomous Agents and Multi-Agent Systems | 2021年 / 35卷

关键词：

Robot learning; Online learning; Robotics; Reinforcement learning; Inverse reinforcement learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.

引用

共 29 条

[1] I2RL: online inverse reinforcement learning under occlusion
Arora, Saurabh
Doshi, Prashant
Banerjee, Bikramjit
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2021, 35 (01)
[2] Online Inverse Reinforcement Learning Under Occlusion
Arora, Saurabh
Doshi, Prashant
Banerjee, Bikramjit
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1170 - 1178
[3] Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions
Bogert, Kenneth
Doshi, Prashant
ARTIFICIAL INTELLIGENCE, 2018, 263 : 46 - 73
[4] Convergence analysis of an incremental approach to online inverse reinforcement learning
Zhuo-jun JIN
Frontiers of Information Technology & Electronic Engineering, 2011, (01) : 17 - 24
[5] Convergence analysis of an incremental approach to online inverse reinforcement learning
Jin, Zhuo-jun
Qian, Hui
Chen, Shen-yi
Zhu, Miao-liang
JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2011, 12 (01): : 17 - 24
[6] Convergence analysis of an incremental approach to online inverse reinforcement learning
Zhuo-jun Jin
Hui Qian
Shen-yi Chen
Miao-liang Zhu
Journal of Zhejiang University SCIENCE C, 2011, 12 : 17 - 24
[7] Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
Peysakhovich, Alexander
AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 409 - 415
[8] Online Observer-Based Inverse Reinforcement Learning
Self, Ryan
Coleman, Kevin
Bai, He
Kamalapurkar, Rushikesh
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (06): : 1922 - 1927
[9] Online Inverse Reinforcement Learning with Learned Observation Model
Arora, Saurabh
Doshi, Prashant
Banerjee, Bikramjit
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1468 - 1477
[10] Online inverse reinforcement learning for nonlinear systems with adversarial attacks
Lian, Bosen
Xue, Wenqian
Lewis, Frank L.
Chai, Tianyou
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (14) : 6646 - 6667

← 1 2 3 →