Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

被引：5

作者：

Zhang, Hao ^{[1
]}

Wang, Hao ^{[1
]}

Kan, Zhen ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2023年 / 8卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Transformers; Robots; Reinforcement learning; Planning; Learning automata; Encoding; Linear temporal logic; motion planning; reinforcement learning;

D O I：

10.1109/LRA.2023.3290511

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.

引用

页码：4831 / 4838

页数：8

共 50 条

[31] Data-Driven Motion Planning: A Survey on Deep Neural Networks, Reinforcement Learning, and Large Language Model Approaches
de Carvalho, Gabriel Peixoto
Sawanobori, Tetsuya
Horii, Takato
IEEE ACCESS, 2025, 13 : 52195 - 52245
[32] On reward distribution in reinforcement learning of multi-agent surveillance systems with temporal logic specifications
Terashima, Keita
Kobayashi, Koichi
Yamashita, Yuh
ADVANCED ROBOTICS, 2024, 38 (06) : 386 - 397
[33] State-Temporal Compression in Reinforcement Learning With the Reward-Restricted Geodesic Metric
Guo, Shangqi
Yan, Qi
Su, Xin
Hu, Xiaolin
Chen, Feng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5572 - 5589
[34] Constrained Reinforcement Learning for Vehicle Motion Planning with Topological Reachability Analysis
Gu, Shangding
Chen, Guang
Zhang, Lijun
Hou, Jing
Hu, Yingbai
Knoll, Alois
ROBOTICS, 2022, 11 (04)
[35] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Li, Qiang
Nie, Jun
Wang, Haixia
Lu, Xiao
Song, Shibin
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
[36] Distributed safe reinforcement learning for multi-robot motion planning
Lu, Yang
Guo, Yaohua
Zhao, Guoxiang
Zhu, Minghui
2021 29TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2021, : 1209 - 1214
[37] Reinforcement Learning-Based Motion Planning for Automatic Parking System
Zhang, Jiren
Chen, Hui
Song, Shaoyu
Hu, Fengwei
IEEE ACCESS, 2020, 8 : 154485 - 154501
[38] StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Shang, Jinghuan
Kahatapitiya, Kumara
Li, Xiang
Ryoo, Michael S.
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 462 - 479
[39] Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay
Wang, Siqi
Yin, Xunyuan
Li, Shaoyuan
Yin, Xiang
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 616 - 621
[40] CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems
Liu, Kun
Wu, Libing
Zhang, Zhuangzhuang
Hu, Xinrong
Lu, Na
Wei, Xuejiang
APPLIED INTELLIGENCE, 2024, : 5976 - 5995

← 1 2 3 4 5 →