Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

被引：5

作者：

Zhang, Hao ^{[1
]}

Wang, Hao ^{[1
]}

Kan, Zhen ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2023年 / 8卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Transformers; Robots; Reinforcement learning; Planning; Learning automata; Encoding; Linear temporal logic; motion planning; reinforcement learning;

D O I：

10.1109/LRA.2023.3290511

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.

引用

页码：4831 / 4838

页数：8

共 31 条

[1] Learning and planning with logical automata [J].

Araki, Brandon ;

Vodrahalli, Kiran ;

Leech, Thomas ;

Vasile, Cristian-Ioan ;

Donahue, Mark ;

Rus, Daniela .

AUTONOMOUS ROBOTS, 2021, 45 (07) :1013-1028

[2]

Baier C, 2008, PRINCIPLES OF MODEL CHECKING, P1

[3]

Balakrishnan A., 2022, 2022 3rd International Conference for Emerging Technology (INCET), P1

[4] Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments From Temporal Logic Specifications [J].

Cai, Mingyu ;

Aasi, Erfan ;

Belta, Calin ;

Vasile, Cristian-Ioan .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) :2158-2165

[5] Modular Deep Reinforcement Learning for Continuous Motion Planning With Temporal Logic [J].

Cai, Mingyu ;

Hasanbeig, Mohammadhosein ;

Xiao, Shaoping ;

Abate, Alessandro ;

Kan, Zhen .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) :7973-7980

[6] Learning-Based Probabilistic LTL Motion Planning With Environment and Motion Uncertainties [J].

Cai, Mingyu ;

Peng, Hao ;

Li, Zhijun ;

Kan, Zhen .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (05) :2386-2392

[7]

Chen LL, 2021, ADV NEUR IN, V34

[8]

Dosovitskiy A., 2021, An image is worth 16x16 words: Transformers for image recognition at scale

[9]

Fakoor R., 2020, P INT C LEARN REPR, P1

[10]

Fujimoto S., 2019, arXiv

← 1 2 3 4 →