Stochastic learning and optimization-A sensitivity-based approach

被引：39

作者：

Cao, Xi-Ren ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Kowloon, Hong Kong, Peoples R China

来源：

ANNUAL REVIEWS IN CONTROL | 2009年 / 33卷 / 01期

关键词：

Perturbation analysis; Markov decision processes; Reinforcement learning; Stochastic control; Performance potentials; Event-based optimization; POLICY-GRADIENT ESTIMATION/; GLOBAL OPTIMIZATION; INFINITE-HORIZON; ALGORITHMS;

D O I：

10.1016/j.arcontrol.2009.03.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We introduce a sensitivity-based view to the area of learning and optimization of stochastic dynamic systems. We show that this sensitivity-based view provides a unified framework for many different disciplines in this area, including perturbation analysis, Markov decision processes, reinforcement learning, identification and adaptive control, and singular stochastic control; and that this unified framework applies to both the discrete event dynamic systems and continuous-time continuous-state systems. Many results in these disciplines can be simply derived and intuitively explained by using two performance sensitivity formulas. In addition, we show that this sensitivity-based view leads to new results and opens up new directions for future research. For example, the n th bias optimality of Markov processes has been established and the event-based optimization may be developed; this approach has computational and other advantages over the state-based approaches. (C) 2009 Elsevier Ltd. All rights reserved.

引用

页码：11 / 24

页数：14

共 32 条

[1] [Anonymous], 2007, DYNAMIC PROGRAMMING
[2] [Anonymous], 2007, DYNAMIC PROGRAMMING
[3] [Anonymous], 1975, Introduction to Stochastic Processes
[4] Astrom KJ., 1995, Adaptive Control
[5] Infinite-horizon policy-gradient estimation
Baxter, J
Bartlett, PL
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
[6] Experiments with infinite-horizon, policy-gradient estimation
Baxter, J
Bartlett, PL
Weaver, L
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 351 - 381
[7] Bertsekas D., 2001, Dynamic Programming and Optimal Control, V1
[8] Bertsekas D., 1995, DYNAMIC PROGRAMMING, V2
[9] Bertsekas D.P., 2001, DYNAMIC PROGRAMMING, V2
[10] Bertsekas Dimitri, 2012, Dynamic programming and optimal control, V1

← 1 2 3 4 →