POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引：0

作者：

Zhou, Yi ^{[1
]}

Fu, Michael C. ^{[2
]}

Ryzhov, Ilya O.

机构：

[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA

[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA

来源：

2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年

关键词：

OPTIMIZATION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.

引用

页码：3039 / 3050

页数：12

共 50 条

[1] On Biased Stochastic Gradient Estimation
Driggs, Derek
Liang, Jingwei
Schonlieb, Carola-Bibiane
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[2] Stochastic Natural Gradient Descent by Estimation of Empirical Covariances
Luigi, Malago
Matteo, Matteucci
Giovanni, Pistone
2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 949 - 956
[3] Task Partitioning and Scheduling Based on Stochastic Policy Gradient in Mobile Crowdsensing
Wang, Tianjing
Zhang, Yu
Shen, Hang
Bai, Guangwei
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (05): : 6580 - 6591
[4] Efficient preconditioned stochastic gradient descent for estimation in latent variable models
Baey, Charlotte
Delattre, Maud
Kuhn, Estelle
Leger, Jean-Benoist
Lemler, Sarah
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[5] Stochastic approximation techniques, applied to parameter estimation in a biological model
Renotte, C
Wouwer, AV
IDAACS'2003: PROCEEDINGS OF THE SECOND IEEE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2003, : 261 - 265
[6] The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
Yang, Zhuang
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 7435 - 7447
[7] Evaluation of preprocessing techniques for improving the accuracy of stochastic rainfall forecast models
Ebtehaj, I.
Bonakdari, H.
Zeynoddin, M.
Gharabaghi, B.
Azari, A.
INTERNATIONAL JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2020, 17 (01) : 505 - 524
[8] Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient Estimation
Wu, Jiahao
Fan, Wenqi
Chen, Jingfan
Liu, Shengcai
Liu, Qijiong
He, Rui
Li, Qing
Tang, Ke
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (01) : 162 - 173
[9] Closed-Loop Aggregated Baseline Load Estimation Using Contextual Bandit With Policy Gradient
Zhang, Yufan
Wu, Qiuwei
Ai, Qian
Catalao, Joao P. S.
IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (01) : 243 - 254
[10] Riemannian Stochastic Recursive Gradient Algorithm
Kasai, Hiroyuki
Sato, Hiroyuki
Mishra, Bamdev
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80

← 1 2 3 4 5 →