POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引:0
|
作者
Zhou, Yi [1 ]
Fu, Michael C. [2 ]
Ryzhov, Ilya O.
机构
[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA
来源
2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.
引用
收藏
页码:3039 / 3050
页数:12
相关论文
共 50 条
  • [31] Generalized Conditional Gradient for Sparse Estimation
    Yu, Yaoliang
    Zhang, Xinhua
    Schuurmans, Dale
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [32] Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent
    Wang, Yuanfeng
    Christley, Scott
    Mjolsness, Eric
    Xie, Xiaohui
    BMC SYSTEMS BIOLOGY, 2010, 4
  • [33] Fast Convergence Stochastic Parallel Gradient Descent Algorithm
    Hu Dongting
    Shen Wen
    Ma Wenchao
    Liu Xinyu
    Su Zhouping
    Zhu Huaxin
    Zhang Xiumei
    Que Lizhi
    Zhu Zhuowei
    Zhang Yixin
    Chen Guoqing
    Hu Lifa
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (12)
  • [34] Riemannian adaptive stochastic gradient algorithms on matrix manifolds
    Kasai, Hiroyuki
    Jawanpuria, Pratik
    Mishra, Bamdev
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [35] SDELP-DDPG: Stochastic differential equations with Lévy processes-driven deep deterministic policy gradient for portfolio management
    Huang, Zhen
    Duan, Junwei
    Zhang, Chuanlin
    Gong, Wenyong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
  • [36] Improving the Transient Times for Distributed Stochastic Gradient Methods
    Huang, Kun
    Pu, Shi
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) : 4127 - 4142
  • [37] DSA: Decentralized Double Stochastic Averaging Gradient Algorithm
    Mokhtari, Aryan
    Ribeiro, Alejandro
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [38] Nested Distributed Gradient Methods with Stochastic Computation Errors
    Iakovidou, Charikleia
    Wei, Ermin
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 339 - 346
  • [39] Stochastic Gradient Descent with Polyak’s Learning Rate
    Mariana Prazeres
    Adam M. Oberman
    Journal of Scientific Computing, 2021, 89
  • [40] The Malliavin gradient method for the calibration of stochastic dynamical models
    Ewald, Christian-Oliver
    APPLIED MATHEMATICS AND COMPUTATION, 2006, 175 (02) : 1332 - 1352