I2RL: online inverse reinforcement learning under occlusion

被引:0
|
作者
Saurabh Arora
Prashant Doshi
Bikramjit Banerjee
机构
[1] University of Georgia,THINC Lab, Department of Computer Science, 415 Boyd GSRC
[2] University of Southern Mississippi,School of Computing Sciences and Computer Engineering
来源
Autonomous Agents and Multi-Agent Systems | 2021年 / 35卷
关键词
Robot learning; Online learning; Robotics; Reinforcement learning; Inverse reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.
引用
收藏
相关论文
共 29 条
  • [1] I2RL: online inverse reinforcement learning under occlusion
    Arora, Saurabh
    Doshi, Prashant
    Banerjee, Bikramjit
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2021, 35 (01)
  • [2] Online Inverse Reinforcement Learning Under Occlusion
    Arora, Saurabh
    Doshi, Prashant
    Banerjee, Bikramjit
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1170 - 1178
  • [3] Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions
    Bogert, Kenneth
    Doshi, Prashant
    ARTIFICIAL INTELLIGENCE, 2018, 263 : 46 - 73
  • [5] Convergence analysis of an incremental approach to online inverse reinforcement learning
    Jin, Zhuo-jun
    Qian, Hui
    Chen, Shen-yi
    Zhu, Miao-liang
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2011, 12 (01): : 17 - 24
  • [6] Convergence analysis of an incremental approach to online inverse reinforcement learning
    Zhuo-jun Jin
    Hui Qian
    Shen-yi Chen
    Miao-liang Zhu
    Journal of Zhejiang University SCIENCE C, 2011, 12 : 17 - 24
  • [7] Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
    Peysakhovich, Alexander
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 409 - 415
  • [8] Online Observer-Based Inverse Reinforcement Learning
    Self, Ryan
    Coleman, Kevin
    Bai, He
    Kamalapurkar, Rushikesh
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (06): : 1922 - 1927
  • [9] Online Inverse Reinforcement Learning with Learned Observation Model
    Arora, Saurabh
    Doshi, Prashant
    Banerjee, Bikramjit
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1468 - 1477
  • [10] Online inverse reinforcement learning for nonlinear systems with adversarial attacks
    Lian, Bosen
    Xue, Wenqian
    Lewis, Frank L.
    Chai, Tianyou
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (14) : 6646 - 6667