Approximate dynamic programming with a fuzzy parameterization

被引：48

作者：

Busoniu, Lucian ^{[1
]}

Ernst, Damien ^{[2
]}

De Schutter, Bart ^{[1
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Delft Ctr Syst &Control, NL-2628 CD Delft, Netherlands

[2] Univ Liege, Inst Montefiore, FNRS, B-4000 Liege, Belgium

来源：

AUTOMATICA | 2010年 / 46卷 / 05期

关键词：

Approximate dynamic programming; Fuzzy approximation; Value iteration; Convergence analysis; ALGORITHM;

D O I：

10.1016/j.automatica.2010.02.006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：804 / 814

页数：11

共 26 条

[1] [Anonymous], 2007, DYNAMIC PROGRAMMING
[2] [Anonymous], P ESIT
[3] [Anonymous], 1996, Neuro-dynamic programming
[4] [Anonymous], 2003, J. Mach. Learn. Res.
[5] Antos A., 2008, Advances in Neural Information Processing Systems, P9
[6] A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
Berenji, HR
Vengerov, D
[J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2003, 11 (04) : 478 - 485
[7] Brown M., 1994, NEUROFUZZY ADAPTIVE
[8] Busoniu L., 2006, Proceedings 9th International Conference of Control, Automation, Robotics, and Vision (ICARCV-06), Singapore, P1347
[9] Busoniu L, 2007, IEEE INT CONF FUZZY, P967
[10] Busoniu L, 2008, LECT NOTES ARTIF INT, V4865, P27, DOI 10.1007/978-3-540-77949-0_3

← 1 2 3 →