ACTOR-CRITIC METHOD FOR HIGH DIMENSIONAL STATIC HAMILTON-JACOBI-BELLMAN PARTIAL DIFFERENTIAL EQUATIONS BASED ON NEURAL NETWORKS

被引：25

作者：

Zhou, Mo ^{[1
]}

Han, Jiequn ^{[2
]}

Lu, Jianfeng ^{[3
,4
]}

机构：

[1] Duke Univ, Dept Math, Durham, NC 27708 USA

[2] Princeton Univ, Dept Math, Princeton, NJ 08544 USA

[3] Duke Univ, Dept Math, Dept Phys, Durham, NC 27708 USA

[4] Duke Univ, Dept Chem, Durham, NC 27708 USA

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2021年 / 43卷 / 06期

基金：

美国国家科学基金会;

关键词：

Hamilton-Jacobi-Bellman equations; high dimensional partial differential equa-tions; stochastic control; actor-critic methods; NUMERICAL-SOLUTION; REACHABLE SETS; APPROXIMATION; ALGORITHMS;

D O I：

10.1137/21M1402303

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We propose a novel numerical method for high dimensional Hamilton-JacobiBellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least-squares temporal difference method using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive step size scheme to improve the accuracy near the domain boundary. Numerical examples up to 20 spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, the diffusive Eikonal equations, and fully nonlinear elliptic PDEs derived from a regulator problem are presented to validate the effectiveness of our proposed method.

引用

页码：A4043 / A4066

页数：24

共 69 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Abdulla Mohammed Shahid, 2007, 2007 American Control Conference, P534, DOI 10.1109/ACC.2007.4282587

[3] On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations [J].

Barles, G ;

Jakobsen, ER .

ESAIM-MATHEMATICAL MODELLING AND NUMERICAL ANALYSIS, 2002, 36 (01) :33-54

[4] Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1998, 96 (03) :589-626

[5] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[6]

Beck C., 2020, An overview on deep learning-based approximation methods for partial differential equations

[7]

Becker S, 2019, J MACH LEARN RES, V20

[8] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[9] A reinforcement learning based algorithm for finite horizon Markov decision processes [J].

Bhatnagar, Shalabh ;

Abdulla, Mohammed Shahid .

PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, :5519-5524

[10] Natural actor-critic algorithms [J].

Bhatnagar, Shalabh ;

Sutton, Richard S. ;

Ghavamzadeh, Mohammad ;

Lee, Mark .

AUTOMATICA, 2009, 45 (11) :2471-2482

← 1 2 3 4 5 6 7 →