Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control

被引：66

作者：

Wu, Huai-Ning ^{[1
]}

Luo, Biao ^{[1
]}

机构：

[1] Beijing Univ Aeronaut & Astronaut, Beihang Univ, Sch Automat Sci & Elect Engn, Sci & Technol Aircraft Control Lab, Beijing 100191, Peoples R China

来源：

INFORMATION SCIENCES | 2013年 / 222卷

基金：

中国国家自然科学基金;

关键词：

Simultaneous policy update algorithm; H-infinity state feedback control; Algebra Riccati equation; Lyapunov equation; Offline; Online; ZERO-SUM GAMES; STABILIZING SOLUTION; RICCATI-EQUATIONS; APPROXIMATIONS;

D O I：

10.1016/j.ins.2012.08.012

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It is well known that the H-infinity state feedback control problem can be viewed as a two-player zero-sum game and reduced to find a solution of the algebra Riccati equation (ARE). In this paper, we propose a simultaneous policy update algorithm (SPUA) for solving the ARE, and develop offline and online versions. The offline SPUA is a model-based approach, which obtains the solution of the ARE by solving a sequence of Lyapunov equations (LEs). Its convergence is established rigorously by constructing a Newton's sequence for the fixed point equation. The online SPUA is a partially model-free approach, which takes advantage of the thought of reinforcement learning (RL) to learn the solution of the ARE online without requiring the internal system dynamics, wherein both players update their action policies simultaneously. The convergence of the online SPUA is proved by demonstrating that it is mathematically equivalent to the offline SPUA. Finally, by conducting comparative simulation studies on an F-16 aircraft plant and a power system, the results show that both the offline SPUA and the online SPUA can find the solution of the ARE, and achieve much better convergence than the existing methods. (C) 2012 Elsevier Inc. All rights reserved.

引用

页码：472 / 485

页数：14

共 33 条

[1] Neurodynamic programming and zero-sum games for constrained control systems [J].

Abu-Khalaf, Murad ;

Lewis, Frank L. ;

Huang, Jie .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (07) :1243-1252

[2]

[Anonymous], 2012, Linear Robust Control

[3]

[Anonymous], 2007, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

[4] Successive Galerkin approximation algorithms for nonlinear optimal and robust control [J].

Beard, RW ;

McLain, TW .

INTERNATIONAL JOURNAL OF CONTROL, 1998, 71 (05) :717-743

[5] Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1998, 96 (03) :589-626

[6] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[7] Approximate policy iteration: A survey and some new methods [J].

Bertsekas D.P. .

Journal of Control Theory and Applications, 2011, 9 (3) :310-335

[8]

Cheney EW., 2001, ANAL APPL MATH

[9] Computation of the stabilizing solution of game theoretic Riccati equation arising in stochastic H∞ control problems [J].

Dragan, Vasile ;

Ivanov, Ivan G. .

NUMERICAL ALGORITHMS, 2011, 57 (03) :357-375

[10] A numerical procedure to compute the stabilising solution of game theoretic Riccati equations of stochastic control [J].

Dragan, Vasile ;

Ivanov, Ivan .

INTERNATIONAL JOURNAL OF CONTROL, 2011, 84 (04) :783-800

← 1 2 3 4 →