Computing Stabilizing Feedback Gains via a Model-Free Policy Gradient Method

被引：6

作者：

Ozaslan, Ibrahim K. ^{[1
]}

Mohammadi, Hesameddin ^{[1
]}

Jovanovic, Mihailo R. ^{[1
]}

机构：

[1] Univ Southern Calif, Ming Hsieh Dept Elect & Comp Engn, Los Angeles, CA 90089 USA

来源：

IEEE CONTROL SYSTEMS LETTERS | 2022年 / 7卷

基金：

美国国家科学基金会;

关键词：

Costs; Convergence; Linear systems; Gradient methods; Computational modeling; Linear programming; Complexity theory; Data-driven control; linear quadratic regulator; model-free control; nonconvex optimization; random search method; reinforcement learning; sample complexity; CONVERGENCE;

D O I：

10.1109/LCSYS.2022.3188180

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In spite of the lack of convexity, convergence and sample complexity properties were recently established for the random search method applied to the linear quadratic regulator (LQR) problem. Since policy gradient approaches require an initial stabilizing controller, we propose a model-free algorithm that searches over the set of state-feedback gains and returns a stabilizing controller in a finite number of iterations. Our algorithm involves a sequence of relaxed LQR problems for which the associated domains converge to the set of stabilizing controllers for the original continuous-time linear time-invariant system. Starting from a stabilizing controller for the relaxed problem, the proposed approach alternates between updating the controller via policy gradient iterations and decreasing relaxation parameter in the LQR cost while preserving stability at all iterations. By properly tuning the relaxation parameter updates we ensure that the cost values do not exceed a uniform threshold and establish computable bounds on the total number of iterations.

引用

页码：407 / 412

页数：6

共 50 条

[1] A convergent algorithm for computing stabilizing static output feedback gains
Yu, JT
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (12) : 2271 - 2275
[2] A Model-Free H∞ Control Method Based on Off-Policy With Output Data Feedback
Li Z.
Fan J.-L.
Jiang Y.
Chai T.-Y.
Fan, Jia-Lu (jlfan@mail.neu.edu.cn), 1600, Science Press (47): : 2182 - 2193
[3] Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient
Beck, Edgar
Bockelmann, Carsten
Dekorsy, Armin
2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 367 - 373
[4] On a New Paradigm for Stock Trading Via a Model-Free Feedback Controller
Barmish, B. Ross
Primbs, James A.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (03) : 662 - 676
[5] Deterministic policy gradient adaptive dynamic programming for model-free optimal control
Zhang, Yongwei
Zhao, Bo
Liu, Derong
NEUROCOMPUTING, 2020, 387 : 40 - 50
[6] Model-Free Nonlinear Feedback Optimization
He, Zhiyu
Bolognani, Saverio
He, Jianping
Dorfler, Florian
Guan, Xinping
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4554 - 4569
[7] Combining Model-Based Design and Model-Free Policy Optimization to Learn Safe, Stabilizing Controllers
Westenbroek, Tyler
Agrawal, Ayush
Castaneda, Fernando
Sastry, S. Shankar
Sreenath, Koushil
IFAC PAPERSONLINE, 2021, 54 (05): : 19 - 24
[8] Gradient flow approach to computing LQ optimal output feedback gains
Yan, Wei-Yong
Teo, Kok L.
Moore, John B.
Optimal Control Applications and Methods, 1994, 15 (01) : 67 - 75
[9] Plume Tracing via Model-Free Reinforcement Learning Method
Hu, Hangkai
Song, Shiji
Chen, C. L. Phillip
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) : 2515 - 2527
[10] Model-free Control Design Using Policy Gradient Reinforcement Learning in LPV Framework
Bao, Yajie
Velni, Javad Mohammadpour
2021 EUROPEAN CONTROL CONFERENCE (ECC), 2021, : 150 - 155

← 1 2 3 4 5 →