Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

被引：0

作者：

Garcia, Javier ^{[1
]}

Rano, Inaki ^{[1
]}

Bures, J. Miguel ^{[2
]}

Fdez-Vidal, Xose R. ^{[2
]}

Iglesias, Roberto ^{[2
]}

机构：

[1] Univ Santiago De Compostela, Dept Elect & Comp Sci, Lugo, Spain

[2] Univ Santiago de Compostela, CiTIUS Ctr Invest Tecnoloxias Intelixentes, Santiago De Compostela, Spain

来源：

APPLIED INTELLIGENCE | 2025年 / 55卷 / 03期

关键词：

Reinforcement learning; Transfer learning; Inter-task mapping; NETWORKS;

D O I：

10.1007/s10489-024-06190-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent's experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.

引用

页数：18

共 50 条

[21] Reinforcement learning in dynamic environment -Abstraction of state-action space utilizing properties of the robot body and environment-
Takeuchi, Yutaka
Ito, Kazuyuki
PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 17TH '12), 2012, : 938 - 942
[22] Learning State-Specific Action Masks for Reinforcement Learning
Wang, Ziyi
Li, Xinran
Sun, Luoyang
Zhang, Haifeng
Liu, Hualin
Wang, Jun
ALGORITHMS, 2024, 17 (02)
[23] Hierarchical Deep Reinforcement Learning for Continuous Action Control
Yang, Zhaoyang
Merrick, Kathryn
Jin, Lianwen
Abbass, Hussein A.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (11) : 5174 - 5184
[24] Safety reinforcement learning control via transfer learning
Zhang, Quanqi
Wu, Chengwei
Tian, Haoyu
Gao, Yabin
Yao, Weiran
Wu, Ligang
AUTOMATICA, 2024, 166
[25] Reinforcement Learning and Robust Control for Robot Compliance Tasks
Cheng-Peng Kuan
Kuu-young Young
Journal of Intelligent and Robotic Systems, 1998, 23 : 165 - 182
[26] Reinforcement learning and robust control for robot compliance tasks
Kuan, CP
Young, KY
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 1998, 23 (2-4) : 165 - 182
[27] Control of Quadrotor Drone with Partial State Observation via Reinforcement Learning
Shan, Guangcun
Zhang, Yinan
Gao, Yong
Wang, Tian
Chen, Jianping
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 1965 - 1968
[28] Transfer learning with Partially Constrained Models: Application to reinforcement learning of linked multicomponent robot system control
Fernandez-Gauna, Borja
Manuel Lopez-Guede, Jose
Grana, Manuel
ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (07) : 694 - 703
[29] learning with policy prediction in continuous state-action multi-agent decision processes
Ghorbani, Farzaneh
Afsharchi, Mohsen
Derhami, Vali
SOFT COMPUTING, 2020, 24 (02) : 901 - 918
[30] A Reward Allocation Method for Reinforcement Learning in Stabilizing Control Tasks
Hosokawa, Shu
Kato, Joji
Nakano, Kazushi
PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 17TH '12), 2012, : 582 - 585

← 1 2 3 4 5 →