Reinforcement Q-Learning for PDF Tracking Control of Stochastic Systems with Unknown Dynamics

被引：0

作者：

Yang, Weiqing ^{[1
,2
]}

Zhou, Yuyang ^{[3
]}

Zhang, Yong ^{[1
,2
]}

Ren, Yan ^{[1
,2
]}

机构：

[1] Inner Mongolia Univ Sci & Technol, Sch Automat & Elect Engn, Baotou 014010, Peoples R China

[2] Inner Mongolia Univ Sci & Technol, Key Lab Synthet Automat Proc Ind Univ Inner Mo, Baotou 014010, Peoples R China

[3] Edinburgh Napier Univ, Sch Comp Engn & Built Environm, Edinburgh EH10 5DT, Scotland

来源：

MATHEMATICS | 2024年 / 12卷 / 16期

基金：

中国国家自然科学基金;

关键词：

tracking control; probability density function; reinforcement learing; B-spline model; Q-learning; model-free; PROBABILITY DENSITY-FUNCTION; TIME-SYSTEMS;

D O I：

10.3390/math12162499

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Tracking control of the output probability density function presents significant challenges, particularly when dealing with unknown system models and multiplicative noise disturbances. To address these challenges, this paper introduces a novel tracking control algorithm based on reinforce-ment Q-learning. Initially, a B-spline model is employed to represent the original system, thereby transforming the control problem into a state weight tracking issue within the B-spline stochastic system model. Moreover, to tackle the challenge of unknown stochastic system dynamics and the presence of multiplicative noise, a model-free reinforcement Q-learning algorithm is employed to solve the control problem. Finally, the proposed algorithm's effectiveness is validated through comprehensive simulation examples.

引用

页数：15

共 32 条

[1]

[曹柳林 Cao Liulin], 2004, [化工学报, Journal of Chemical Industry and Engineering (China)], V55, P742

[2] MODEL-FREE MEAN-FIELD REINFORCEMENT LEARNING: MEAN-FIELD MDP AND MEAN-FIELD Q-LEARNING [J].

Carmona, Rene ;

Lauriere, Mathieu ;

Tan, Zongjun .

ANNALS OF APPLIED PROBABILITY, 2023, 33 (6B) :5334-5381

[3] Minimum-Variance Control System with Variable Control Penalty Factor [J].

Filip, Ioan ;

Dragan, Florin ;

Szeidert, Iosif ;

Albu, Adriana .

APPLIED SCIENCES-BASEL, 2020, 10 (07)

[4]

Garg D., 2023, ARXIV

[5] Learning Optimal Controllers for Linear Systems With Multiplicative Noise via Policy Gradient [J].

Gravell, Benjamin ;

Esfahani, Peyman Mohajerin ;

Summers, Tyler .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) :5283-5298

[6]

Hansen-Estruch P., 2023, arXiv

[7] Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies [J].

Hu, Bin ;

Zhang, Kaiqing ;

Li, Na ;

Mesbahi, Mehran ;

Fazel, Maryam ;

Basar, Tamer .

ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 6 :123-158

[8]

Huang E., 2022, CONTROL ENG CHINA, V29, P6

[9] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Modares, Hamidreza ;

Karimpour, Ali ;

Naghibi-Sistani, Mohammad-Bagher .

AUTOMATICA, 2014, 50 (04) :1167-1175

[10]

Lewis FL., 2012, Optimal Control, V3

← 1 2 3 4 →