Learning state-action correspondence across reinforcement learning control tasks via partially paired trajectories

被引:0
|
作者
Garcia, Javier [1 ]
Rano, Inaki [1 ]
Bures, J. Miguel [2 ]
Fdez-Vidal, Xose R. [2 ]
Iglesias, Roberto [2 ]
机构
[1] Univ Santiago De Compostela, Dept Elect & Comp Sci, Lugo, Spain
[2] Univ Santiago de Compostela, CiTIUS Ctr Invest Tecnoloxias Intelixentes, Santiago De Compostela, Spain
关键词
Reinforcement learning; Transfer learning; Inter-task mapping; NETWORKS;
D O I
10.1007/s10489-024-06190-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many reinforcement learning (RL) tasks, the state-action space may be subject to changes over time (e.g., increased number of observable features, changes of representation of actions). Given these changes, the previously learnt policy will likely fail due to the mismatch of input and output features, and another policy must be trained from scratch, which is inefficient in terms of sample complexity. Recent works in transfer learning have succeeded in making RL algorithms more efficient by incorporating knowledge from previous tasks, thus partially alleviating this problem. However, such methods typically must provide an explicit state-action correspondence of one task into the other. An autonomous agent may not have access to such high-level information, but should be able to analyze its experience to identify similarities between tasks. In this paper, we propose a novel method for automatically learning a correspondence of states and actions from one task to another through an agent's experience. In contrast to previous approaches, our method is based on two key insights: i) only the first state of the trajectories of the two tasks is paired, while the rest are unpaired and randomly collected, and ii) the transition model of the source task is used to predict the dynamics of the target task, thus aligning the unpaired states and actions. Additionally, this paper intentionally decouples the learning of the state-action corresponce from the transfer technique used, making it easy to combine with any transfer method. Our experiments demonstrate that our approach significantly accelerates transfer learning across a diverse set of problems, varying in state/action representation, physics parameters, and morphology, when compared to state-of-the-art algorithms that rely on cycle-consistency.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] PROJECTED STATE-ACTION BALANCING WEIGHTS FOR OFFLINE REINFORCEMENT LEARNING
    Wang, Jiayi
    Qi, Zhengling
    Wong, Raymond K. W.
    ANNALS OF STATISTICS, 2023, 51 (04) : 1639 - 1665
  • [2] STATE-ACTION VALUE FUNCTION MODELED BY ELM IN REINFORCEMENT LEARNING FOR HOSE CONTROL PROBLEMS
    Manuel Lopez-Guede, Jose
    Fernandez-Gauna, Borja
    Grana, Manuel
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2013, 21 : 99 - 116
  • [3] A REINFORCEMENT LEARNING MODEL USING DETERMINISTIC STATE-ACTION SEQUENCES
    Murata, Makoto
    Ozawa, Seiichi
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 577 - 590
  • [4] Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads
    Aizu, Tomoharu
    Oba, Takeru
    Ukita, Norimichi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 184 - 196
  • [5] Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning
    Pritz, Paul J.
    Ma, Liang
    Leung, Kin K.
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1447 - 1456
  • [6] Model-Based Reinforcement Learning Exploiting State-Action Equivalence
    Asadi, Mahsa
    Talebi, Mohammad Sadegh
    Bourel, Hippolyte
    Maillard, Odalric-Ambrym
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 204 - 219
  • [7] Swarm Reinforcement Learning Methods for Problems with Continuous State-Action Space
    Iima, Hitoshi
    Kuroe, Yasuaki
    Emoto, Kazuo
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2173 - 2180
  • [8] Learning Multi-Goal Dialogue Strategies Using Reinforcement Learning With Reduced State-Action Spaces
    Cuayahuitl, Heriberto
    Renals, Steve
    Lemon, Oliver
    Shimodaira, Hiroshi
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 469 - +
  • [9] Online Reinforcement Learning Control of Nonlinear Dynamic Systems: A State-action Value Function Based Solution
    Asl, Hamed Jabbari
    Uchibe, Eiji
    NEUROCOMPUTING, 2023, 544
  • [10] Scaling Up Q-Learning via Exploiting State-Action Equivalence
    Lyu, Yunlian
    Come, Aymeric
    Zhang, Yijie
    Talebi, Mohammad Sadegh
    ENTROPY, 2023, 25 (04)