On Task-Relevant Loss Functions in Meta-Reinforcement Learning

被引：0

作者：

Shin, Jaeuk ^{[1
]}

Kim, Giho ^{[1
]}

Lee, Howon ^{[2
]}

Han, Joonho ^{[1
]}

Yang, Insoon ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, ASRI, Seoul, South Korea

[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, ASRI, Seoul, South Korea

来源：

6TH ANNUAL LEARNING FOR DYNAMICS & CONTROL CONFERENCE | 2024年 / 242卷

基金：

新加坡国家研究基金会;

关键词：

Reinforcement learning; meta-reinforcement learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning both the task inference module and the system model. This systematically couples the model discrepancy and the value estimate, thereby enabling our proposed algorithm to learn the policy and task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The proposed method is evaluated in high-dimensional robotic control, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample-efficient manner.

引用

页码：1174 / 1186

页数：13

共 40 条

[1]

Abachi R, 2021, Arxiv, DOI arXiv:2003.00030

[2]

[Anonymous], 1996, Neuro-dynamic Programming

[3]

Beck J, 2024, Arxiv, DOI [arXiv:2301.08028, 10.48550/ARXIV.2301.08028]

[4] Model-Based Meta-Reinforcement Learning for Flight With Suspended Payloads [J].

Belkhale, Suneel ;

Li, Rachel ;

Kahn, Gregory ;

McAllister, Rowan ;

Calandra, Roberto ;

Levine, Sergey .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) :1471-1478

[5]

Beukman Michael, 2024, Advances in Neural Information Processing Systems, V36

[6] Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments [J].

Bing, Zhenshan ;

Knak, Lukas ;

Cheng, Long ;

Morin, Fabrice O. ;

Huang, Kai ;

Knoll, Alois .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) :13604-13618

[7]

Dalal Murtaza, 2018, ICLR

[8]

Duan Y, 2016, Arxiv, DOI [arXiv:1611.02779, 10.48550/arXiv.1611.02779]

[9]

Duff Michael OGordon, 2002, Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes

[10]

Perez CF, 2018, Arxiv, DOI arXiv:1812.03399

← 1 2 3 4 →