Bounds for Off-policy Prediction in Reinforcement Learning

被引：0

作者：

Joseph, Ajin George ^{[1
]}

Bhatnagar, Shalabh ^{[1
,2
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore, Karnataka, India

[2] Indian Inst Sci, Robert Bosch Ctr Cyber Phys Syst, Bangalore, Karnataka, India

来源：

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.

引用

页码：3991 / 3997

页数：7

共 50 条

[41] Balanced prioritized experience replay in off-policy reinforcement learning
Lou Z.
Wang Y.
Shan S.
Zhang K.
Wei H.
Neural Computing and Applications, 2024, 36 (25) : 15721 - 15737
[42] Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning
Bernstein, Andrey
Chen, Yue
Colombino, Marcello
Dall'Anese, Emiliano
Mehta, Prashant
Meyn, Sean
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5244 - 5251
[43] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[44] Conformal Off-Policy Prediction
Zhang, Yingying
Shi, Chengchun
Luo, Shikai
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[45] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu Z.-J.
Gao X.-G.
Wan K.-F.
Zhang L.-T.
Wang Q.-L.
Neretin E.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
[46] Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Daley, Brett
White, Martha
Amato, Christopher
Machado, Marlos C.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[47] Model-free off-policy reinforcement learning in continuous environment
Wawrzynski, P
Pacut, A
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1091 - 1096
[48] Boosted Off-Policy Learning
London, Ben
Lu, Levi
Sandler, Ted
Joachims, Thorsten
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[49] Re-attentive experience replay in off-policy reinforcement learning
Wei Wei
Da Wang
Lin Li
Jiye Liang
Machine Learning, 2024, 113 : 2327 - 2349
[50] VALUE-AWARE IMPORTANCE WEIGHTING FOR OFF-POLICY REINFORCEMENT LEARNING
De Asis, Kristopher
Graves, Eric
Sutton, Richard S.
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 745 - 763

← 1 2 3 4 5 →