Safe reinforcement learning-based control using deep deterministic policy gradient algorithm and slime mould algorithm with experimental tower crane system validation

被引：3

作者：

Zamfirache, Iuliu Alexandru ^{[1
]}

Precup, Radu-Emil ^{[1
,2
]}

Petriu, Emil M. ^{[3
]}

机构：

[1] Politehn Univ Timisoara, Dept Automat & Appl Informat, Bd V Parvan 2, Timisoara 300223, Romania

[2] Romanian Acad, Ctr Fundamental & Adv Tech Res, Timisoara Branch, Bd Mihai Viteazu 24, Timisoara 300223, Romania

[3] Univ Ottawa, Sch Elect Engn & Comp Sci, 800 King Edward, Ottawa, ON K1N 6N5, Canada

来源：

INFORMATION SCIENCES | 2025年 / 692卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Deep deterministic policy gradient; Optimal reference tracking control; Safe reinforcement learning; Slime mould algorithm; Tower crane systems;

D O I：

10.1016/j.ins.2024.121640

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel optimal control approach resulting from the combination between the safe Reinforcement Learning (RL) framework represented by a Deep Deterministic Policy Gradient (DDPG) algorithm and a Slime Mould Algorithm (SMA) as a representative nature-inspired optimization algorithm. The main drawbacks of the traditional DDPG-based safe RL optimal control approach are the possible instability of the control system caused by randomly generated initial values of the controller parameters and the lack of state safety guarantees in the first iterations of the learning process due to (i) and (ii): (i) the safety constraints are considered only in the DDPG-based training process of the controller, which is usually implemented as a neural network (NN); (ii) the initial values of the weights and the biases of the NN-based controller are initialized with randomly generated values. The proposed approach mitigates these drawbacks by initializing the parameters of the NN-based controller using SMA. The fitness function of the SMA-based initialization process is designed to incorporate state safety constraints into the search process, resulting in an initial NN-based controller with embedded state safety constraints. The proposed approach is compared to the classical one using real-time experimental results and performance indices popular for optimal reference tracking control problems and based on a state safety score.

引用

页数：18

共 15 条

[1] Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation
Zamfirache I.A.
Precup R.-E.
Petriu E.M.
Applied Soft Computing, 2024, 160
[2] Reinforcement Learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system
Zamfirache, Iuliu Alexandru
Precup, Radu-Emil
Roman, Raul-Cristian
Petriu, Emil M.
INFORMATION SCIENCES, 2022, 583 : 99 - 120
[3] Policy Iteration Reinforcement Learning-based control using a Grey Wolf Optimizer algorithm
Zamfirache, Iuliu Alexandru
Precup, Radu-Emil
Roman, Raul-Cristian
Petriu, Emil M.
INFORMATION SCIENCES, 2022, 585 : 162 - 175
[4] A Multi-Variable Coupled Control Strategy Based on a Deep Deterministic Policy Gradient Reinforcement Learning Algorithm for a Small Pressurized Water Reactor
Chen, Jie
Xiao, Kai
Huang, Ke
Yang, Zhen
Chu, Qing
Jiang, Guanfu
ENERGIES, 2025, 18 (06)
[5] Deep Deterministic Policy Gradient Algorithm based Lateral and Longitudinal Control for Autonomous Driving
Zhu Gongsheng
Pei Chunmei
Ding Jiang
Shi Junfeng
2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 736 - 741
[6] Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm
Wang, Pin
Li, Hanhan
Chan, Ching-Yao
2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 1454 - 1460
[7] Agent-Based Energy Sharing Mechanism Using Deep Deterministic Policy Gradient Algorithm
Kuang, Yi
Wang, Xiuli
Zhao, Hongyang
Huang, Yijun
Chen, Xianlong
Wang, Xifan
ENERGIES, 2020, 13 (19)
[8] Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks
Sumiea, Ebrahim Hamid Hasan
Abdulkadir, Said Jadid
Ragab, Mohammed Gamal
Al-Selwi, Safwan Mahmood
Fati, Suliamn Mohamed
Alqushaibi, Alawi
Alhussian, Hitham
IEEE ACCESS, 2023, 11 : 139771 - 139784
[9] Unmanned Aerial Vehicle Trajectory Planning and Power Control Algorithm Based on Deep Deterministic Policy Gradient
Yang Q.
Chen J.
Peng Y.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2023, 46 (03): : 43 - 48
[10] Cooperative Control of Power Grid Frequency Based on Expert-Guided Deep Deterministic Policy Gradient Algorithm
Shen, Tao
Zhang, Jing
He, Yu
Yang, Shengsun
Zhang, Demu
Yang, Zhaorui
IEEE ACCESS, 2025, 13 : 38502 - 38514

← 1 2 →