Bounds for Off-policy Prediction in Reinforcement Learning

被引:0
|
作者
Joseph, Ajin George [1 ]
Bhatnagar, Shalabh [1 ,2 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore, Karnataka, India
[2] Indian Inst Sci, Robert Bosch Ctr Cyber Phys Syst, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.
引用
收藏
页码:3991 / 3997
页数:7
相关论文
共 50 条
  • [41] Balanced prioritized experience replay in off-policy reinforcement learning
    Lou Z.
    Wang Y.
    Shan S.
    Zhang K.
    Wei H.
    Neural Computing and Applications, 2024, 36 (25) : 15721 - 15737
  • [42] Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning
    Bernstein, Andrey
    Chen, Yue
    Colombino, Marcello
    Dall'Anese, Emiliano
    Mehta, Prashant
    Meyn, Sean
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5244 - 5251
  • [43] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [44] Conformal Off-Policy Prediction
    Zhang, Yingying
    Shi, Chengchun
    Luo, Shikai
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [45] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu Z.-J.
    Gao X.-G.
    Wan K.-F.
    Zhang L.-T.
    Wang Q.-L.
    Neretin E.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [46] Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
    Daley, Brett
    White, Martha
    Amato, Christopher
    Machado, Marlos C.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [47] Model-free off-policy reinforcement learning in continuous environment
    Wawrzynski, P
    Pacut, A
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1091 - 1096
  • [48] Boosted Off-Policy Learning
    London, Ben
    Lu, Levi
    Sandler, Ted
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [49] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    Machine Learning, 2024, 113 : 2327 - 2349
  • [50] VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING
    De Asis, Kristopher
    Graves, Eric
    Sutton, Richard S.
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 745 - 763