Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

被引:0
|
作者
Garcia, Javier [1 ]
Rano, Inaki [1 ]
Bures, J. Miguel [2 ]
Fdez-Vidal, Xose R. [2 ]
Iglesias, Roberto [2 ]
机构
[1] Univ Santiago De Compostela, Dept Elect & Comp Sci, Lugo, Spain
[2] Univ Santiago de Compostela, CiTIUS Ctr Invest Tecnoloxias Intelixentes, Santiago De Compostela, Spain
关键词
Reinforcement learning; Transfer learning; Inter-task mapping; NETWORKS;
D O I
10.1007/s10489-024-06190-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent's experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] A reward allocation method for reinforcement learning in stabilizing control tasks
    Hosokawa, Shu
    Kato, Joji
    Nakano, Kazushi
    ARTIFICIAL LIFE AND ROBOTICS, 2014, 19 (02) : 109 - 114
  • [32] Reinforcement learning with augmented states in partially expectation and action observable environment
    Guirnaldo, SA
    Watanabe, K
    Izumi, K
    Kiguchi, K
    SICE 2002: PROCEEDINGS OF THE 41ST SICE ANNUAL CONFERENCE, VOLS 1-5, 2002, : 823 - 828
  • [33] Experiments of conditioned reinforcement learning in continuous space control tasks
    Fernandez-Gauna, Borja
    Osa, Juan Luis
    Grana, Manuel
    NEUROCOMPUTING, 2018, 271 : 38 - 47
  • [34] Safe Reinforcement Learning via Episodic Control
    Li, Zhuo
    Zhu, Derui
    Grossklags, Jens
    IEEE ACCESS, 2025, 13 : 35270 - 35280
  • [35] learning with policy prediction in continuous state-action multi-agent decision processes
    Farzaneh Ghorbani
    Mohsen Afsharchi
    Vali Derhami
    Soft Computing, 2020, 24 : 901 - 918
  • [36] Reinforcement Learning of Chaotic Systems Control in Partially Observable Environments
    Weissenbacher, Max
    Borovykh, Anastasia
    Rigas, Georgios
    FLOW TURBULENCE AND COMBUSTION, 2025,
  • [37] A dynamic control decision approach for fixed-wing aircraft games via hybrid action reinforcement learning
    Zhuang, Xing
    Li, Dongguang
    Li, Hanyu
    Wang, Yue
    Zhu, Jihong
    SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
  • [38] Counterfactual state explanations for reinforcement learning agents via generative deep learning
    Olson, Matthew L.
    Khanna, Roli
    Neal, Lawrence
    Li, Fuxin
    Wong, Weng-Keen
    ARTIFICIAL INTELLIGENCE, 2021, 295
  • [39] Learning Assembly Tasks in a Few Minutes by Combining Impedance Control and Residual Recurrent Reinforcement Learning
    Kulkarni, Padmaja
    Kober, Jens
    Babuska, Robert
    Della Santina, Cosimo
    ADVANCED INTELLIGENT SYSTEMS, 2022, 4 (01)
  • [40] Experiments with reinforcement learning in problems with continuous state and action spaces
    Santamaria, JC
    Sutton, RS
    Ram, A
    ADAPTIVE BEHAVIOR, 1997, 6 (02) : 163 - 217