Stochastic learning and optimization-A sensitivity-based approach

被引:39
作者
Cao, Xi-Ren [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Kowloon, Hong Kong, Peoples R China
关键词
Perturbation analysis; Markov decision processes; Reinforcement learning; Stochastic control; Performance potentials; Event-based optimization; POLICY-GRADIENT ESTIMATION/; GLOBAL OPTIMIZATION; INFINITE-HORIZON; ALGORITHMS;
D O I
10.1016/j.arcontrol.2009.03.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a sensitivity-based view to the area of learning and optimization of stochastic dynamic systems. We show that this sensitivity-based view provides a unified framework for many different disciplines in this area, including perturbation analysis, Markov decision processes, reinforcement learning, identification and adaptive control, and singular stochastic control; and that this unified framework applies to both the discrete event dynamic systems and continuous-time continuous-state systems. Many results in these disciplines can be simply derived and intuitively explained by using two performance sensitivity formulas. In addition, we show that this sensitivity-based view leads to new results and opens up new directions for future research. For example, the n th bias optimality of Markov processes has been established and the event-based optimization may be developed; this approach has computational and other advantages over the state-based approaches. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:11 / 24
页数:14
相关论文
共 32 条
  • [1] [Anonymous], 2007, DYNAMIC PROGRAMMING
  • [2] [Anonymous], 2007, DYNAMIC PROGRAMMING
  • [3] [Anonymous], 1975, Introduction to Stochastic Processes
  • [4] Astrom KJ., 1995, Adaptive Control
  • [5] Infinite-horizon policy-gradient estimation
    Baxter, J
    Bartlett, PL
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
  • [6] Experiments with infinite-horizon, policy-gradient estimation
    Baxter, J
    Bartlett, PL
    Weaver, L
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 351 - 381
  • [7] Bertsekas D., 2001, Dynamic Programming and Optimal Control, V1
  • [8] Bertsekas D., 1995, DYNAMIC PROGRAMMING, V2
  • [9] Bertsekas D.P., 2001, DYNAMIC PROGRAMMING, V2
  • [10] Bertsekas Dimitri, 2012, Dynamic programming and optimal control, V1