Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking

被引：13

作者：

Wang, Xia ^{[1
,2
]}

Liu, Jun ^{[2
]}

Nugent, Chris ^{[2
]}

Cleland, Ian ^{[2
]}

Xu, Yang ^{[3
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 610031, Peoples R China

[2] Ulster Univ, Sch Comp, Belfast BT15 1ED, North Ireland

[3] Southwest Jiaotong Univ, Sch Math, Chengdu 610031, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2023年 / 264卷

基金：

中国国家自然科学基金;

关键词：

Expected reward; Mobile agent; Uncertain environment; Probabilistic model checking; Q-learning; FRAMEWORK; ALGORITHM;

D O I：

10.1016/j.knosys.2023.110355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The major challenge in mobile agent path planning, within an uncertain environment, is effectively determining an optimal control model to discover the target location as quickly as possible and eval-uating the control system's reliability. To address this challenge, we introduce a learning-verification integrated mobile agent path planning method to achieve both the effectiveness and the reliability. More specifically, we first propose a modified Q-learning algorithm (a popular reinforcement learning algorithm), called QEA - learning algorithm, to find the best Q-table in the environment. We then determine the location transition probability matrix, and establish a probability model using the assumption that the agent selects a location with a higher Q-value. Secondly, the learnt behaviour of the mobile agent based on QEA - learning algorithm, is formalized as a Discrete-time Markov Chain (DTMC) model. Thirdly, the required reliability requirements of the mobile agent control system are specified using Probabilistic Computation Tree Logic (PCTL). In addition, the DTMC model and the specified properties are taken as the input of the Probabilistic Model Checker PRISM for automatic verification. This is preformed to evaluate and verify the control system's reliability. Finally, a case study of a mobile agent walking in a grids map is used to illustrate the proposed learning algorithm. Here we have a special focus on the modelling approach demonstrating how PRISM can be used to analyse and evaluate the reliability of the mobile agent control system learnt via the proposed algorithm. The results show that the path identified using the proposed integrated method yields the largest expected reward.(c) 2023 Elsevier B.V. All rights reserved.

引用

页数：10

共 47 条

[1] Double Q-PID algorithm for mobile robot control [J].

Carlucho, Ignacio ;

De Paula, Mariano ;

Acosta, Gerardo G. .

EXPERT SYSTEMS WITH APPLICATIONS, 2019, 137 :292-307

[2] Efficient Probabilistic Model Checking of Smart Building Maintenance using Fault Maintenance Trees [J].

Cauchi, Nathalie ;

Hoque, Khaza Anuarul ;

Abate, Alessandro ;

Stoelinga, Marielle .

BUILDSYS'17: PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILT ENVIRONMENTS, 2017,

[3] Reinforcement based mobile robot path planning with improved dynamic window approach in unknown environment [J].

Chang, Lu ;

Shan, Liang ;

Jiang, Chao ;

Dai, Yuewei .

AUTONOMOUS ROBOTS, 2021, 45 (01) :51-76

[4]

Changde Zhu, 2021, 2021 China Automation Congress (CAC), P2206, DOI [10.1109/IGARSS47720.2021.9554284, 10.1109/CAC53003.2021.9728542]

[5]

Ciesinski R, 2004, LECT NOTES COMPUT SC, V2925, P147

[6]

Clemen R.T., 2013, MAKING HARD DECISION

[7]

Cui Y., 2022, EXPERT SYST APPL

[8] Mobile Robot Path Planning Based on Ant Colony Algorithm With A* Heuristic Method [J].

Dai, Xiaolin ;

Long, Shuai ;

Zhang, Zhiwen ;

Gong, Dawei .

FRONTIERS IN NEUROROBOTICS, 2019, 13

[9] Continuous stochastic logic characterizes bisimulation of continuous-time Markov processes [J].

Desharnais, J ;

Panangaden, P .

JOURNAL OF LOGIC AND ALGEBRAIC PROGRAMMING, 2003, 56 (1-2) :99-115

[10]

Duflot M., 2012, PRACTICAL APPL PROBA

← 1 2 3 4 5 →