POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引：0

作者：

Zhou, Yi ^{[1
]}

Fu, Michael C. ^{[2
]}

Ryzhov, Ilya O.

机构：

[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA

[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA

来源：

2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年

关键词：

OPTIMIZATION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.

引用

页码：3039 / 3050

页数：12

共 50 条

[31] Generalized Conditional Gradient for Sparse Estimation
Yu, Yaoliang
Zhang, Xinhua
Schuurmans, Dale
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
[32] Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent
Wang, Yuanfeng
Christley, Scott
Mjolsness, Eric
Xie, Xiaohui
BMC SYSTEMS BIOLOGY, 2010, 4
[33] Fast Convergence Stochastic Parallel Gradient Descent Algorithm
Hu Dongting
Shen Wen
Ma Wenchao
Liu Xinyu
Su Zhouping
Zhu Huaxin
Zhang Xiumei
Que Lizhi
Zhu Zhuowei
Zhang Yixin
Chen Guoqing
Hu Lifa
LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (12)
[34] Riemannian adaptive stochastic gradient algorithms on matrix manifolds
Kasai, Hiroyuki
Jawanpuria, Pratik
Mishra, Bamdev
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[35] SDELP-DDPG: Stochastic differential equations with Lévy processes-driven deep deterministic policy gradient for portfolio management
Huang, Zhen
Duan, Junwei
Zhang, Chuanlin
Gong, Wenyong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
[36] Improving the Transient Times for Distributed Stochastic Gradient Methods
Huang, Kun
Pu, Shi
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) : 4127 - 4142
[37] DSA: Decentralized Double Stochastic Averaging Gradient Algorithm
Mokhtari, Aryan
Ribeiro, Alejandro
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[38] Nested Distributed Gradient Methods with Stochastic Computation Errors
Iakovidou, Charikleia
Wei, Ermin
2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 339 - 346
[39] Stochastic Gradient Descent with Polyak’s Learning Rate
Mariana Prazeres
Adam M. Oberman
Journal of Scientific Computing, 2021, 89
[40] The Malliavin gradient method for the calibration of stochastic dynamical models
Ewald, Christian-Oliver
APPLIED MATHEMATICS AND COMPUTATION, 2006, 175 (02) : 1332 - 1352

← 1 2 3 4 5 →