Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation

被引:18
|
作者
Zamfirache I.A. [1 ]
Precup R.-E. [1 ,2 ]
Petriu E.M. [3 ]
机构
[1] Politehnica University of Timisoara, Department of Automation and Applied Informatics, Bd. V. Parvan 2, Timisoara
[2] Romanian Academy – Timisoara Branch, Center for Fundamental and Advanced Technical Research, Bd. Mihai Viteazu 24, Timisoara
[3] University of Ottawa, School of Electrical Engineering and Computer Science, 800 King Edward, Ottawa, K1N 6N5, ON
基金
加拿大自然科学与工程研究理事会;
关键词
Adaptive reinforcement learning; Proximal Policy Optimization; Reference tracking control; Slime Mould Algorithm; Tower crane systems;
D O I
10.1016/j.asoc.2024.111687
中图分类号
学科分类号
摘要
This paper presents a novel optimal reference tracking control approach resulted from the combination of a popular policy gradient Reinforcement Learning (RL) algorithm, namely Proximal Policy Optimization (PPO), and a metaheuristic Slime Mould Algorithm (SMA). One of the most important parameters in the PPO-based RL process is the learning rate, which has a big impact on how the parameters of the actor neural network (NN) are iteratively updated. In every episode of the RL process, the weights and the biases of the actor NN are multiplied with the learning rate, determining how much the learning agent will step into a certain direction computed based on previous experiences. The classical PPO algorithm usually relies on fixed values for the learning rates which rarely change, or not at all, during the learning process. However, its main drawback is that the learning agent cannot take advantage of positive momentum in the learning process by accelerating towards good learning experiences or slow down and quickly change the direction in the case of consecutive negative learning experiences. The main objective of the combination proposed in this paper is to create an adaptive SMA-based PPO approach applied to control systems, which instead of using fixed learning rate values, it uses the SMA to compute optimal values of the learning rates in each time step of the learning process based on the progress of the learning agent. This paper investigates if the adaptive SMA-based PPO control approach can be considered as an alternative to the classical PPO version, which employs fixed values of the learning rate. A comparison is carried out using control system performance indices gathered while performing an optimal reference tracking control task on tower crane system laboratory equipment. © 2024 The Authors
引用
收藏
相关论文
共 10 条
  • [1] Safe reinforcement learning-based control using deep deterministic policy gradient algorithm and slime mould algorithm with experimental tower crane system validation
    Zamfirache, Iuliu Alexandru
    Precup, Radu-Emil
    Petriu, Emil M.
    INFORMATION SCIENCES, 2025, 692
  • [2] Vibration control of three coupled flexible beams using reinforcement learning algorithm based on proximal policy optimization
    Qiu, Zhi-cheng
    Du, Jia-hao
    Zhang, Xian-min
    JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 2022, 33 (20) : 2578 - 2603
  • [3] Modular production control using deep reinforcement learning: proximal policy optimization
    Sebastian Mayer
    Tobias Classen
    Christian Endisch
    Journal of Intelligent Manufacturing, 2021, 32 : 2335 - 2351
  • [4] Modular production control using deep reinforcement learning: proximal policy optimization
    Mayer, Sebastian
    Classen, Tobias
    Endisch, Christian
    JOURNAL OF INTELLIGENT MANUFACTURING, 2021, 32 (08) : 2335 - 2351
  • [5] An adaptive traffic signal control scheme with Proximal Policy Optimization based on deep reinforcement learning for a single intersection
    Wang, Lijuan
    Zhang, Guoshan
    Yang, Qiaoli
    Han, Tianyang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
  • [6] On Explainability of Reinforcement Learning-Based Machine Learning Agents Trained with Proximal Policy Optimization That Utilizes Visual Sensor Data
    Hachaj, Tomasz
    Piekarczyk, Marcin
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [7] Adaptive Metro Service Schedule and Train Composition With a Proximal Policy Optimization Approach Based on Deep Reinforcement Learning
    Ying, Cheng-Shuo
    Chow, Andy H. F.
    Wang, Yi-Hui
    Chin, Kwai-Sang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 6895 - 6906
  • [8] Reinforcement learning-driven proximal policy optimization-based voltage control for PV and WT integrated power system
    Rehman, Anis Ur
    Ullah, Zia
    Qazi, Hasan Saeed
    Hasanien, Hany M.
    Khalid, Haris M.
    RENEWABLE ENERGY, 2024, 227
  • [9] A new approach for drone tracking with drone using Proximal Policy Optimization based distributed deep reinforcement learning
    Tan, Ziya
    Karakose, Mehmet
    SOFTWAREX, 2023, 23
  • [10] Proximal policy optimization-based reinforcement learning approach for DC-DC boost converter control: A comparative evaluation against traditional control techniques
    Saha, Utsab
    Jawad, Atik
    Shahria, Shakib
    Rashid, A. B. M. Harun-Ur
    HELIYON, 2024, 10 (18)