Reinforcement Q-Learning for PDF Tracking Control of Stochastic Systems with Unknown Dynamics

被引:0
作者
Yang, Weiqing [1 ,2 ]
Zhou, Yuyang [3 ]
Zhang, Yong [1 ,2 ]
Ren, Yan [1 ,2 ]
机构
[1] Inner Mongolia Univ Sci & Technol, Sch Automat & Elect Engn, Baotou 014010, Peoples R China
[2] Inner Mongolia Univ Sci & Technol, Key Lab Synthet Automat Proc Ind Univ Inner Mo, Baotou 014010, Peoples R China
[3] Edinburgh Napier Univ, Sch Comp Engn & Built Environm, Edinburgh EH10 5DT, Scotland
基金
中国国家自然科学基金;
关键词
tracking control; probability density function; reinforcement learing; B-spline model; Q-learning; model-free; PROBABILITY DENSITY-FUNCTION; TIME-SYSTEMS;
D O I
10.3390/math12162499
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Tracking control of the output probability density function presents significant challenges, particularly when dealing with unknown system models and multiplicative noise disturbances. To address these challenges, this paper introduces a novel tracking control algorithm based on reinforce-ment Q-learning. Initially, a B-spline model is employed to represent the original system, thereby transforming the control problem into a state weight tracking issue within the B-spline stochastic system model. Moreover, to tackle the challenge of unknown stochastic system dynamics and the presence of multiplicative noise, a model-free reinforcement Q-learning algorithm is employed to solve the control problem. Finally, the proposed algorithm's effectiveness is validated through comprehensive simulation examples.
引用
收藏
页数:15
相关论文
共 32 条
[1]  
[曹柳林 Cao Liulin], 2004, [化工学报, Journal of Chemical Industry and Engineering (China)], V55, P742
[2]   MODEL-FREE MEAN-FIELD REINFORCEMENT LEARNING: MEAN-FIELD MDP AND MEAN-FIELD Q-LEARNING [J].
Carmona, Rene ;
Lauriere, Mathieu ;
Tan, Zongjun .
ANNALS OF APPLIED PROBABILITY, 2023, 33 (6B) :5334-5381
[3]   Minimum-Variance Control System with Variable Control Penalty Factor [J].
Filip, Ioan ;
Dragan, Florin ;
Szeidert, Iosif ;
Albu, Adriana .
APPLIED SCIENCES-BASEL, 2020, 10 (07)
[4]  
Garg D., 2023, ARXIV
[5]   Learning Optimal Controllers for Linear Systems With Multiplicative Noise via Policy Gradient [J].
Gravell, Benjamin ;
Esfahani, Peyman Mohajerin ;
Summers, Tyler .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) :5283-5298
[6]  
Hansen-Estruch P., 2023, arXiv
[7]   Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies [J].
Hu, Bin ;
Zhang, Kaiqing ;
Li, Na ;
Mesbahi, Mehran ;
Fazel, Maryam ;
Basar, Tamer .
ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 6 :123-158
[8]  
Huang E., 2022, CONTROL ENG CHINA, V29, P6
[9]   Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Modares, Hamidreza ;
Karimpour, Ali ;
Naghibi-Sistani, Mohammad-Bagher .
AUTOMATICA, 2014, 50 (04) :1167-1175
[10]  
Lewis FL., 2012, Optimal Control, V3