Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

被引:0
|
作者
Garcia, Javier [1 ]
Rano, Inaki [1 ]
Bures, J. Miguel [2 ]
Fdez-Vidal, Xose R. [2 ]
Iglesias, Roberto [2 ]
机构
[1] Univ Santiago De Compostela, Dept Elect & Comp Sci, Lugo, Spain
[2] Univ Santiago de Compostela, CiTIUS Ctr Invest Tecnoloxias Intelixentes, Santiago De Compostela, Spain
关键词
Reinforcement learning; Transfer learning; Inter-task mapping; NETWORKS;
D O I
10.1007/s10489-024-06190-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent's experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Decentralized control in active distribution grids via supervised and reinforcement learning
    Karagiannopoulos, Stavros
    Aristidou, Petros
    Hug, Gabriela
    Botterud, Audun
    ENERGY AND AI, 2024, 16
  • [42] Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization
    Zhang, Yuan
    Wang, Jianhong
    Boedecker, Joschka
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [43] A Review of Reinforcement Learning for Fixed-Wing Aircraft Control Tasks
    Richter, David J.
    Calix, Ricardo A.
    Kim, Kyungbaek
    IEEE ACCESS, 2024, 12 : 103026 - 103048
  • [44] Multi-objective fuzzy Q-learning to solve continuous state-action problems
    Asgharnia, Amirhossein
    Schwartz, Howard
    Atia, Mohamed
    NEUROCOMPUTING, 2023, 516 : 115 - 132
  • [45] Advanced Building Control via Deep Reinforcement Learning
    Jia, Ruoxi
    Jin, Ming
    Sun, Kaiyu
    Hong, Tianzhen
    Spanos, Costas
    INNOVATIVE SOLUTIONS FOR ENERGY TRANSITIONS, 2019, 158 : 6158 - 6163
  • [46] Perception-Action Coupling Target Tracking Control for a Snake Robot via Reinforcement Learning
    Bing, Zhenshan
    Lemke, Christian
    Morin, Fabric O.
    Jiang, Zhuangyi
    Cheng, Long
    Huang, Kai
    Knoll, Alois
    FRONTIERS IN NEUROROBOTICS, 2020, 14
  • [47] Online reinforcement learning control via discontinuous gradient
    Arellano-Muro, Carlos A.
    Castillo-Toledo, Bernardino
    Di Gennaro, Stefano
    Loukianov, Alexander G.
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2024, 38 (05) : 1762 - 1776
  • [48] Causality in Reinforcement Learning Control: The State of the Art and Prospects
    Sun Y.-W.
    Liu W.-Z.
    Sun C.-Y.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (03): : 661 - 677
  • [49] Average reward rates enable motivational transfer across independent reinforcement learning tasks
    Aberg, Kristoffer C.
    Paz, Rony
    FRONTIERS IN BEHAVIORAL NEUROSCIENCE, 2022, 16
  • [50] Feature Learning and Transfer Performance Prediction for Video Reinforcement Learning Tasks via a Siamese Convolutional Neural Network
    Song, Jinhua
    Gao, Yang
    Wang, Hao
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT I, 2018, 11301 : 350 - 361