Bounds for Off-policy Prediction in Reinforcement Learning

被引:0
|
作者
Joseph, Ajin George [1 ]
Bhatnagar, Shalabh [1 ,2 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore, Karnataka, India
[2] Indian Inst Sci, Robert Bosch Ctr Cyber Phys Syst, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.
引用
收藏
页码:3991 / 3997
页数:7
相关论文
共 50 条
  • [21] Mixed experience sampling for off-policy reinforcement learning
    Yu, Jiayu
    Li, Jingyao
    Lu, Shuai
    Han, Shuai
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [22] Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
    Zhang, Zeyu
    Su, Yi
    Yuan, Hui
    Wu, Yiran
    Balasubramanian, Rishab
    Wu, Qingyun
    Wang, Huazheng
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Off-Policy Reinforcement Learning for H∞ Control Design
    Luo, Biao
    Wu, Huai-Ning
    Huang, Tingwen
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (01) : 65 - 76
  • [24] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
    Wang S.-R.
    Niu W.-J.
    Tong E.-D.
    Chen T.
    Li H.
    Tian Y.-Z.
    Liu J.-Q.
    Han Z.
    Li Y.-D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
  • [25] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [26] Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
    Gelada, Carles
    Bellemare, Marc G.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3647 - 3655
  • [27] Safe Off-policy Reinforcement Learning Using Barrier Functions
    Marvi, Zahra
    Kiumarsi, Bahare
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 2176 - 2181
  • [28] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [29] Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
    Liu, Xu-Hui
    Xue, Zhenghai
    Pang, Jing-Cheng
    Jiang, Shengyi
    Xu, Feng
    Yu, Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [30] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Weiwei Wang
    Yuqiang Li
    Xianyi Wu
    Statistics and Computing, 2024, 34