Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

被引:5
作者
Zhang, Hao [1 ]
Wang, Hao [1 ]
Kan, Zhen [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Transformers; Robots; Reinforcement learning; Planning; Learning automata; Encoding; Linear temporal logic; motion planning; reinforcement learning;
D O I
10.1109/LRA.2023.3290511
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.
引用
收藏
页码:4831 / 4838
页数:8
相关论文
共 50 条
  • [31] Data-Driven Motion Planning: A Survey on Deep Neural Networks, Reinforcement Learning, and Large Language Model Approaches
    de Carvalho, Gabriel Peixoto
    Sawanobori, Tetsuya
    Horii, Takato
    IEEE ACCESS, 2025, 13 : 52195 - 52245
  • [32] On reward distribution in reinforcement learning of multi-agent surveillance systems with temporal logic specifications
    Terashima, Keita
    Kobayashi, Koichi
    Yamashita, Yuh
    ADVANCED ROBOTICS, 2024, 38 (06) : 386 - 397
  • [33] State-Temporal Compression in Reinforcement Learning With the Reward-Restricted Geodesic Metric
    Guo, Shangqi
    Yan, Qi
    Su, Xin
    Hu, Xiaolin
    Chen, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5572 - 5589
  • [34] Constrained Reinforcement Learning for Vehicle Motion Planning with Topological Reachability Analysis
    Gu, Shangding
    Chen, Guang
    Zhang, Lijun
    Hou, Jing
    Hu, Yingbai
    Knoll, Alois
    ROBOTICS, 2022, 11 (04)
  • [35] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
    Li, Qiang
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Song, Shibin
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
  • [36] Distributed safe reinforcement learning for multi-robot motion planning
    Lu, Yang
    Guo, Yaohua
    Zhao, Guoxiang
    Zhu, Minghui
    2021 29TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2021, : 1209 - 1214
  • [37] Reinforcement Learning-Based Motion Planning for Automatic Parking System
    Zhang, Jiren
    Chen, Hui
    Song, Shaoyu
    Hu, Fengwei
    IEEE ACCESS, 2020, 8 : 154485 - 154501
  • [38] StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
    Shang, Jinghuan
    Kahatapitiya, Kumara
    Li, Xiang
    Ryoo, Michael S.
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 462 - 479
  • [39] Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay
    Wang, Siqi
    Yin, Xunyuan
    Li, Shaoyuan
    Yin, Xiang
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 616 - 621
  • [40] CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems
    Liu, Kun
    Wu, Libing
    Zhang, Zhuangzhuang
    Hu, Xinrong
    Lu, Na
    Wei, Xuejiang
    APPLIED INTELLIGENCE, 2024, : 5976 - 5995