Bounds for Off-policy Prediction in Reinforcement Learning

被引：0

作者：

Joseph, Ajin George ^{[1
]}

Bhatnagar, Shalabh ^{[1
,2
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore, Karnataka, India

[2] Indian Inst Sci, Robert Bosch Ctr Cyber Phys Syst, Bangalore, Karnataka, India

来源：

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.

引用

页码：3991 / 3997

页数：7

共 50 条

[1] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[2] Off-Policy Reinforcement Learning with Gaussian Processes
Girish Chowdhary
Miao Liu
Robert Grande
Thomas Walsh
Jonathan How
Lawrence Carin
IEEE/CAAJournalofAutomaticaSinica, 2014, 1 (03) : 227 - 238
[3] Off-Policy Reinforcement Learning with Delayed Rewards
Han, Beining
Ren, Zhizhou
Wu, Zuofan
Zhou, Yuan
Peng, Jian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[4] Representations for Stable Off-Policy Reinforcement Learning
Ghosh, Dibya
Bellemare, Marc G.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[5] A perspective on off-policy evaluation in reinforcement learning
Li, Lihong
FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (05) : 911 - 912
[6] On the Reuse Bias in Off-Policy Reinforcement Learning
Ying, Chengyang
Hao, Zhongkai
Zhou, Xinning
Su, Hang
Yan, Dong
Zhu, Jun
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4513 - 4521
[7] A perspective on off-policy evaluation in reinforcement learning
Lihong Li
Frontiers of Computer Science, 2019, 13 : 911 - 912
[8] Reliable Off-Policy Evaluation for Reinforcement Learning
Wang, Jie
Gao, Rui
Zha, Hongyuan
OPERATIONS RESEARCH, 2024, 72 (02) : 699 - 716
[9] Sequential Search with Off-Policy Reinforcement Learning
Miao, Dadong
Wang, Yanan
Tang, Guoyu
Liu, Lin
Xu, Sulong
Long, Bo
Xiao, Yun
Wu, Lingfei
Jiang, Yunjiang
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4006 - 4015
[10] Representations for Stable Off-Policy Reinforcement Learning
Ghosh, Dibya
Bellemare, Marc G.
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,

← 1 2 3 4 5 →