Performance Guarantees for Model-Based Approximate Dynamic Programming in Continuous Spaces

被引：11

作者：

Beuchat, Paul Nathaniel ^{[1
]}

Georghiou, Angelos ^{[2
]}

Lygeros, John ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Automat Control Lab, CH-8092 Zurich, Switzerland

[2] McGill Univ, Desautels Fac Management, Montreal, PQ H3A 0G4, Canada

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2020年 / 65卷 / 01期

基金：

欧洲研究理事会;

关键词：

Aerospace electronics; Dynamic programming; Optimal control; Numerical models; Linear programming; Stochastic processes; Mathematical model; Discrete-time systems; dynamic programming; infinite horizon optimal control; stochastic systems;

D O I：

10.1109/TAC.2019.2906423

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study both the value function and Q-function formulation of the linear programming approach to approximate dynamic programming. The approach is model based and optimizes over a restricted function space to approximate the value function or Q-function. Working in the discrete time, continuous space setting, we provide guarantees for the fitting error and online performance of the policy. In particular, the online performance guarantee is obtained by analyzing an iterated version of the greedy policy, and the fitting error guarantee by analyzing an iterated version of the Bellman inequality. These guarantees complement the existing bounds that appear in the literature. The Q-function formulation offers benefits, for example, in the decentralized controller design, however, it can lead to computationally demanding optimization problems. To alleviate this drawback, we provide a condition that simplifies the formulation, resulting in improved computational times.

引用

页码：143 / 158

页数：16

共 49 条

[1]

[Anonymous], 2001, P 40 IEEE C DECISION

[2]

[Anonymous], 1989, LEARNING DELAYED REW

[3]

[Anonymous], ADP TOOLBOX CODE REP

[4]

[Anonymous], ARXIV160207273V3CSSY

[5]

[Anonymous], 1996, Neuro-dynamic Programming

[6] ON THE THEORY OF DYNAMIC PROGRAMMING [J].

BELLMAN, R .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1952, 38 (08) :716-719

[7] Adjustable robust solutions of uncertain linear programs [J].

Ben-Tal, A ;

Goryashko, A ;

Guslitzer, E ;

Nemirovski, A .

MATHEMATICAL PROGRAMMING, 2004, 99 (02) :351-376

[8]

Bertsekas D. P., 2017, DYNAMIC PROGRAMMING, V4th

[9] Dynamic programming and suboptimal control: A survey from ADP to MPC [J].

Bertsekas, DP .

EUROPEAN JOURNAL OF CONTROL, 2005, 11 (4-5) :310-334

[10]

Beuchat P, 2016, 2016 EUROPEAN CONTROL CONFERENCE (ECC), P1616, DOI 10.1109/ECC.2016.7810522

← 1 2 3 4 5 →