I2RL: online inverse reinforcement learning under occlusion

被引：0

作者：

Saurabh Arora

Prashant Doshi

Bikramjit Banerjee

机构：

[1] University of Georgia,THINC Lab, Department of Computer Science, 415 Boyd GSRC

[2] University of Southern Mississippi,School of Computing Sciences and Computer Engineering

来源：

Autonomous Agents and Multi-Agent Systems | 2021年 / 35卷

关键词：

Robot learning; Online learning; Robotics; Reinforcement learning; Inverse reinforcement learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.

引用

共 29 条

[21] Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
Wang, Qi
Hao, Yongsheng
Zhang, Jiawei
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (09)
[22] Online l2-regularized Reinforcement Learning via RBF Neural Network
Song, Tianheng
Li, Dazi
PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 6627 - 6632
[23] A P2P Online Lending Agency Risk Identification Approach Based on Reinforcement Learning
Wang, Tao
Li, Lei
Zhou, Yanquan
PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 68 - 72
[24] Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies
Yan, Zijiang
Tabassum, Hina
2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 1241 - 1246
[25] i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops
Abeyruwan, Saminda
Graesser, Laura
D'Ambrosio, David B.
Singh, Avi
Shankar, Anish
Bewley, Alex
Jain, Deepali
Choromanski, Krzysztof
Sanketi, Pannag R.
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 212 - 224
[26] Reinforcement Learning-Based Joint Beamwidth and Beam Alignment Interval Optimization in V2I Communications
Lee, Jihun
Kim, Hun
So, Jaewoo
SENSORS, 2024, 24 (03)
[27] An inverse reinforcement learning algorithm with population evolution mechanism for the multi-objective flexible job-shop scheduling problem under time-of-use electricity tariffs
Zhao, Fuqing
Wang, Weiyuan
Zhu, Ningning
Xu, Tianpeng
APPLIED SOFT COMPUTING, 2025, 170
[28] Smart Mode Selection Using Online Reinforcement Learning for VR Broadband Broadcasting in D2D Assisted 5G HetNets
Feng, Lei
Yang, Zhixiang
Yang, Yang
Que, Xiaoyu
Zhang, Kai
IEEE TRANSACTIONS ON BROADCASTING, 2020, 66 (02) : 600 - 611
[29] Optimal Bidding and Operation of a Power Plant with Solvent-Based Carbon Capture under a CO2 Allowance Market: A Solution with a Reinforcement Learning-Based Sarsa Temporal-Difference Algorithm
Li, Ziang
Ding, Zhengtao
Wang, Meihong
ENGINEERING, 2017, 3 (02) : 257 - 265

← 1 2 3 →