Conformal Off-Policy Prediction

被引:0
|
作者
Zhang, Yingying [1 ]
Shi, Chengchun [2 ]
Luo, Shikai [3 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] London Sch Econ & Polit Sci, London, England
[3] Bytedance, Beijing, Peoples R China
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷
基金
中国国家自然科学基金; 英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Conformal Off-Policy Prediction in Contextual Bandits
    Taufiq, Muhammad Faaiz
    Ton, Jean-Francois
    Cornish, Rob
    Teh, Yee Whye
    Doucet, Arnaud
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Conformal Off-Policy Evaluation in Markov Decision Processes
    Foffano, Daniele
    Russo, Alessio
    Proutiere, Alexandre
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3087 - 3094
  • [3] Importance Resampling for Off-policy Prediction
    Schlegel, Matthew
    Chung, Wesley
    Graves, Daniel
    Qian, Jian
    White, Martha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Off-Policy Evaluation via Off-Policy Classification
    Irpan, Alex
    Rao, Kanishka
    Bousmalis, Konstantinos
    Harris, Chris
    Ibarz, Julian
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Bounds for Off-policy Prediction in Reinforcement Learning
    Joseph, Ajin George
    Bhatnagar, Shalabh
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997
  • [6] Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots
    Rafiee, Banafsheh
    Ghiassian, Sina
    White, Adam
    Sutton, Richard S.
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 332 - 340
  • [7] Off-Policy Prediction Learning: An Empirical Study of Online Algorithms
    Ghiassian, Sina
    Rafiee, Banafsheh
    Sutton, Richard S.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [8] Off-Policy Proximal Policy Optimization
    Meng, Wenjia
    Zheng, Qian
    Pan, Gang
    Yin, Yilong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9162 - 9170
  • [9] A Nonparametric Off-Policy Policy Gradient
    Tosatto, Samuele
    Carvalho, Joao
    Abdulsamad, Hany
    Peters, Jan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [10] Boosted Off-Policy Learning
    London, Ben
    Lu, Levi
    Sandler, Ted
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206