Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

被引：17

作者：

Cai, Mingyu ^{[1
]}

Xiao, Shaoping ^{[1
]}

Li, Baoluo ^{[2
]}

Li, Zhiliang ^{[2
]}

Kan, Zhen ^{[2
]}

机构：

[1] Univ Iowa, Dept Mech Engn, Iowa City, IA 52242 USA

[2] Univ Sci & Technol China, Dept Automat, Hefei, Anhui, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

MARKOV DECISION-PROCESSES;

D O I：

10.1109/ICRA48506.2021.9561903

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of complex tasks, which are expressed by linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process (PL-MDP) with unknown transition probabilities and probabilistic labeling functions. The LTL task specification is converted to a limit deterministic generalized Buchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (ELDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets of LDGBA at each round of the repeated visiting pattern, to overcome the difficulties of directly applying conventional LDGBA. With appropriate dependent reward and discount functions, rigorous analysis shows that any method, which optimizes the expected discount return of the RL-based approach, is guaranteed to find the optimal policy to maximize the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.

引用

页码：806 / 812

页数：7

共 27 条

[1]

Baier C, 2008, PRINCIPLES OF MODEL CHECKING, P1

[2]

Bozkurt A. K., 2020, INT C ROB AUT, p10 349

[3]

Cai M., 2020, ARXIV200714325

[4]

Cai M., 2020, ARXIV201006797

[5]

Cai M., 2020, IEEE T AUTOM CONTROL

[6] Receding Horizon Control-Based Motion Planning With Partially Infeasible LTL Constraints [J].

Cai, Mingyu ;

Peng, Hao ;

Li, Zhijun ;

Gao, Hongbo ;

Kan, Zhen .

IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (04) :1279-1284

[7]

Camacho A, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P6065

[8] Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints [J].

Ding, Xuchu ;

Smith, Stephen L. ;

Belta, Calin ;

Rus, Daniela .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (05) :1244-1257

[9]

Fu J., 2014, P ROB SCI SYST ROB C

[10] Reduced Variance Deep Reinforcement Learning with Temporal Logic Specifications [J].

Gao, Qitong ;

Hajinezhad, Davood ;

Zhang, Yan ;

Kantaros, Yiannis ;

Zavlanos, Michael M. .

ICCPS '19: PROCEEDINGS OF THE 2019 10TH ACM/IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS, 2019, :237-248

← 1 2 3 →