POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES

被引:0
|
作者
Zhou, Yi [1 ]
Fu, Michael C. [2 ]
Ryzhov, Ilya O.
机构
[1] Univ Maryland, Inst Syst Res, Dept Math, 8223 Paint Branch Dr, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, Robert H Smith Sch Business, 7699 Mowatt Ln, College Pk, MD 20742 USA
来源
2022 WINTER SIMULATION CONFERENCE (WSC) | 2022年
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we consider policy evaluation in a finite-horizon setting with continuous state variables. The Bellman equation represents the value function as a conditional expectation, which can be further transformed into a ratio of two stochastic gradients. By using the finite difference method and the generalized likelihood ratio method, we propose new estimators for policy evaluation and show how the value of any given state can be estimated using sample paths starting from various other states.
引用
收藏
页码:3039 / 3050
页数:12
相关论文
共 50 条
  • [1] On Biased Stochastic Gradient Estimation
    Driggs, Derek
    Liang, Jingwei
    Schonlieb, Carola-Bibiane
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [2] Stochastic Natural Gradient Descent by Estimation of Empirical Covariances
    Luigi, Malago
    Matteo, Matteucci
    Giovanni, Pistone
    2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 949 - 956
  • [3] Task Partitioning and Scheduling Based on Stochastic Policy Gradient in Mobile Crowdsensing
    Wang, Tianjing
    Zhang, Yu
    Shen, Hang
    Bai, Guangwei
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (05): : 6580 - 6591
  • [4] Efficient preconditioned stochastic gradient descent for estimation in latent variable models
    Baey, Charlotte
    Delattre, Maud
    Kuhn, Estelle
    Leger, Jean-Benoist
    Lemler, Sarah
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [5] Stochastic approximation techniques, applied to parameter estimation in a biological model
    Renotte, C
    Wouwer, AV
    IDAACS'2003: PROCEEDINGS OF THE SECOND IEEE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2003, : 261 - 265
  • [6] The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
    Yang, Zhuang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 7435 - 7447
  • [7] Evaluation of preprocessing techniques for improving the accuracy of stochastic rainfall forecast models
    Ebtehaj, I.
    Bonakdari, H.
    Zeynoddin, M.
    Gharabaghi, B.
    Azari, A.
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2020, 17 (01) : 505 - 524
  • [8] Condensing Pre-Augmented Recommendation Data via Lightweight Policy Gradient Estimation
    Wu, Jiahao
    Fan, Wenqi
    Chen, Jingfan
    Liu, Shengcai
    Liu, Qijiong
    He, Rui
    Li, Qing
    Tang, Ke
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (01) : 162 - 173
  • [9] Closed-Loop Aggregated Baseline Load Estimation Using Contextual Bandit With Policy Gradient
    Zhang, Yufan
    Wu, Qiuwei
    Ai, Qian
    Catalao, Joao P. S.
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (01) : 243 - 254
  • [10] Riemannian Stochastic Recursive Gradient Algorithm
    Kasai, Hiroyuki
    Sato, Hiroyuki
    Mishra, Bamdev
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80