Trajectory planning of mobile robot: A Lyapunov-based reinforcement learning approach with implicit policy

被引：0

作者：

Lai, Jialun ^{[1
,2
]}

Wu, Zongze ^{[4
,5
]}

Ren, Zhigang ^{[2
,3
]}

Tan, Qi ^{[4
]}

Xie, Shengli ^{[2
]}

机构：

[1] Guangzhou Maritime Univ, Sch Low Altitude Equipment & Intelligent Control, Guangzhou 510006, Guangdong, Peoples R China

[2] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Guangdong, Peoples R China

[3] Minist Educ, Key Lab Intelligent Detect & Internet Things Mfg G, Guangzhou 510006, Peoples R China

[4] Shenzhen Univ, Coll Mechatron & Control Engn, Shenzhen 518052, Guangdong, Peoples R China

[5] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518123, Guangdong, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 325卷

基金：

中国国家自然科学基金;

关键词：

Mobile robots; Intelligent control; Lyapunov theory; Reinforcement learning; Dynamical system movement; NAVIGATION; TIME;

D O I：

10.1016/j.knosys.2025.113870

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Trajectory planning for mobile robots is a crucial aspect of achieving intelligence in many industrial applications. Learning-based approaches are extremely useful for problems involving complex and difficult-to-define rule designs. However, these approaches frequently require a large amount of training data and lack convergence or interpretability. This work proposes a reinforcement learning paradigm that combines implicit policy with Lyapunov theory to solve the problem of mobile robot trajectory planning. Firstly, we develop a weighted asymmetric Lyapunov reward function and provide an analytical solution with modest dynamics as the implicit policy. Then, we propose event-triggered multi-objective policy optimization, an approach that dynamically adjusts optimization objectives based on event-triggered conditions, which organically fuse it into the modified soft Actor-Critic algorithm, thus shrinking the exploration space and enabling iterative improvement of RL policy. We demonstrate that in disturbed and random scenarios, the proposed fusion policy can achieve specialized policy learning and that its convergence, efficiency, and generalization are verifiable. This clearly demonstrates that our approach can be utilized as a foundational paradigm for the design of reinforcement learning reward and motion control in trajectory planning using an end-to-end approach, which has significant advantages in terms of convergence speed and interpretability.

引用

页数：15

共 47 条

[1]

Berkenkamp F, 2017, ADV NEUR IN, V30

[2] Deep Reinforcement Learning for Soft, Flexible Robots: Brief Review with Impending Challenges [J].

Bhagat, Sarthak ;

Banerjee, Hritwick ;

Tse, Zion Tsz Ho ;

Ren, Hongliang .

ROBOTICS, 2019, 8 (01)

[3]

Chai R., 2024, IEEE Trans. Cybern.

[4] Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment [J].

Chai, Runqi ;

Niu, Hanlin ;

Carrasco, Joaquin ;

Arvin, Farshad ;

Yin, Hujun ;

Lennox, Barry .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) :5778-5792

[5]

Chandak Y., 2023, PMLR, P4009

[6]

Chang YC, 2019, ADV NEUR IN, V32

[7] A Neural Network-Based Navigation Approach for Autonomous Mobile Robot Systems [J].

Chen, Yiyang ;

Cheng, Chuanxin ;

Zhang, Yueyuan ;

Li, Xinlin ;

Sun, Lining .

APPLIED SCIENCES-BASEL, 2022, 12 (15)

[8]

Chow Y, 2018, ADV NEUR IN, V31

[9] Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes [J].

Coraluppi, SP ;

Marcus, SI .

AUTOMATICA, 1999, 35 (02) :301-309

[10] Near-Optimal Multi-Robot Motion Planning with Finite Sampling [J].

Dayan, Dror ;

Solovey, Kiril ;

Pavone, Marco ;

Halperin, Dan .

IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (05) :3422-3436

← 1 2 3 4 5 →