A deep reinforcement learning control framework for a partially observable system: experimental validation on a rotary flexible link system

被引:0
作者
Kumar, V. Joshi [1 ]
Elumalai, Vinodh Kumar [1 ]
机构
[1] Vellore Inst Technol, Sch Elect Engn, Vellore 632014, Tamil Nadu, India
关键词
Deep reinforcement learning; convolutional deep deterministic policy gradient; flexible link; vibration suppression; tracking control;
D O I
10.1080/00207721.2025.2468870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper puts forward a novel deep reinforcement learning control framework to realise continuous action control for a partially observable system. One of the central problems in continuous action control is finding an optimal policy, which can make the agent achieve the control goals without violating the constraints. Although the reinforcement learning technique (RL) is primarily applied for addressing the optimisation problem in continuous action space, the critical limitation of the existing methods is that they utilise only a one-step state transition approach and fail to capitalise on the information available in the sequence of its previous states. Consequently, learning an optimal policy for continuous action space through current techniques may not be effective. Hence, this study attempts to solve the optimisation problem by integrating a convolutional neural network in a deep reinforcement learning (DRL) framework and realise an optimal policy through an inverse n-step temporal difference learning method. Moreover, we formulate a novel convolutional deep deterministic policy gradient (CDDPG) algorithm and present the convergence analysis through the Bellman contraction operator. One of the key benefits of the proposed approach is that it improves the performance of the RL agent by not only utilising information from a one-step transition but also extracting the hidden information from previous state sequences. The efficacy of the proposed scheme is experimentally validated on a rotary flexible link (RFL) system for tracking control and vibration suppression problems. The experimental validation of the proposed scheme on an RFL system highlights that the CDDPG can offer better tracking and vibration suppression features compared to those of the conventional DDPG and the state-of-the-art proximal policy optimisation (PPO) techniques.
引用
收藏
页数:25
相关论文
共 28 条
[1]   Deep reinforcement learning approach for MPPT control of partially shaded PV systems in Smart Grids [J].
Avila, Luis ;
De Paula, Mariano ;
Trimboli, Maximiliano ;
Carlucho, Ignacio .
APPLIED SOFT COMPUTING, 2020, 97
[2]   Robust Output Regulation and Reinforcement Learning-Based Output Tracking Design for Unknown Linear Discrete-Time Systems [J].
Chen, Ci ;
Xie, Lihua ;
Jiang, Yi ;
Xie, Kan ;
Xie, Shengli .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (04) :2391-2398
[3]   Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning [J].
Chen, Ci ;
Xie, Lihua ;
Xie, Kan ;
Lewis, Frank L. ;
Xie, Shengli .
AUTOMATICA, 2022, 146
[4]  
Ciosek Kamil, 2019, Advances in Neural Information Processing Systems, V32
[5]  
Fortunato M, 2019, Arxiv, DOI arXiv:1706.10295
[6]   Necessary and sufficient conditions for H-∞ static output-feedback control [J].
Gadewadikar, Jyotirmay ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2006, 29 (04) :915-920
[7]   Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle [J].
Hadi, Behnaz ;
Khosravi, Alireza ;
Sarhadi, Pouria .
APPLIED OCEAN RESEARCH, 2022, 129
[8]   A CNN-based policy for optimizing continuous action control by learning state sequences [J].
Huang, Tianyi ;
Li, Min ;
Qin, Xiaolong ;
Zhu, William .
NEUROCOMPUTING, 2022, 468 :286-295
[9]   Intelligent proximal-policy-optimization-based decision-making system for humanoid robots [J].
Kuo, Ping-Huan ;
Yang, Wei-Cyuan ;
Hsu, Po-Wei ;
Chen, Kuan-Lin .
ADVANCED ENGINEERING INFORMATICS, 2023, 56
[10]   Exploration in deep reinforcement learning: A survey [J].
Ladosz, Pawel ;
Weng, Lilian ;
Kim, Minwoo ;
Oh, Hyondong .
INFORMATION FUSION, 2022, 85 :1-22