I2RL: online inverse reinforcement learning under occlusion

被引:0
作者
Saurabh Arora
Prashant Doshi
Bikramjit Banerjee
机构
[1] University of Georgia,THINC Lab, Department of Computer Science, 415 Boyd GSRC
[2] University of Southern Mississippi,School of Computing Sciences and Computer Engineering
来源
Autonomous Agents and Multi-Agent Systems | 2021年 / 35卷
关键词
Robot learning; Online learning; Robotics; Reinforcement learning; Inverse reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.
引用
收藏
相关论文
共 29 条
  • [21] Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
    Wang, Qi
    Hao, Yongsheng
    Zhang, Jiawei
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (09)
  • [22] Online l2-regularized Reinforcement Learning via RBF Neural Network
    Song, Tianheng
    Li, Dazi
    PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 6627 - 6632
  • [23] A P2P Online Lending Agency Risk Identification Approach Based on Reinforcement Learning
    Wang, Tao
    Li, Lei
    Zhou, Yanquan
    PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 68 - 72
  • [24] Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies
    Yan, Zijiang
    Tabassum, Hina
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 1241 - 1246
  • [25] i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops
    Abeyruwan, Saminda
    Graesser, Laura
    D'Ambrosio, David B.
    Singh, Avi
    Shankar, Anish
    Bewley, Alex
    Jain, Deepali
    Choromanski, Krzysztof
    Sanketi, Pannag R.
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 212 - 224
  • [26] Reinforcement Learning-Based Joint Beamwidth and Beam Alignment Interval Optimization in V2I Communications
    Lee, Jihun
    Kim, Hun
    So, Jaewoo
    SENSORS, 2024, 24 (03)
  • [27] An inverse reinforcement learning algorithm with population evolution mechanism for the multi-objective flexible job-shop scheduling problem under time-of-use electricity tariffs
    Zhao, Fuqing
    Wang, Weiyuan
    Zhu, Ningning
    Xu, Tianpeng
    APPLIED SOFT COMPUTING, 2025, 170
  • [28] Smart Mode Selection Using Online Reinforcement Learning for VR Broadband Broadcasting in D2D Assisted 5G HetNets
    Feng, Lei
    Yang, Zhixiang
    Yang, Yang
    Que, Xiaoyu
    Zhang, Kai
    IEEE TRANSACTIONS ON BROADCASTING, 2020, 66 (02) : 600 - 611
  • [29] Optimal Bidding and Operation of a Power Plant with Solvent-Based Carbon Capture under a CO2 Allowance Market: A Solution with a Reinforcement Learning-Based Sarsa Temporal-Difference Algorithm
    Li, Ziang
    Ding, Zhengtao
    Wang, Meihong
    ENGINEERING, 2017, 3 (02) : 257 - 265