Conformal Off-Policy Prediction

被引：0

作者：

Zhang, Yingying ^{[1
]}

Shi, Chengchun ^{[2
]}

Luo, Shikai ^{[3
]}

机构：

[1] East China Normal Univ, Shanghai, Peoples R China

[2] London Sch Econ & Polit Sci, London, England

[3] Bytedance, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷

基金：

中国国家自然科学基金; 英国工程与自然科学研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.

引用

页数：18

共 50 条

[1] Conformal Off-Policy Prediction in Contextual Bandits
Taufiq, Muhammad Faaiz
Ton, Jean-Francois
Cornish, Rob
Teh, Yee Whye
Doucet, Arnaud
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] Conformal Off-Policy Evaluation in Markov Decision Processes
Foffano, Daniele
Russo, Alessio
Proutiere, Alexandre
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3087 - 3094
[3] Importance Resampling for Off-policy Prediction
Schlegel, Matthew
Chung, Wesley
Graves, Daniel
Qian, Jian
White, Martha
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Off-Policy Evaluation via Off-Policy Classification
Irpan, Alex
Rao, Kanishka
Bousmalis, Konstantinos
Harris, Chris
Ibarz, Julian
Levine, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] Bounds for Off-policy Prediction in Reinforcement Learning
Joseph, Ajin George
Bhatnagar, Shalabh
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997
[6] Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots
Rafiee, Banafsheh
Ghiassian, Sina
White, Adam
Sutton, Richard S.
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 332 - 340
[7] Off-Policy Prediction Learning: An Empirical Study of Online Algorithms
Ghiassian, Sina
Rafiee, Banafsheh
Sutton, Richard S.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
[8] Off-Policy Proximal Policy Optimization
Meng, Wenjia
Zheng, Qian
Pan, Gang
Yin, Yilong
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9162 - 9170
[9] A Nonparametric Off-Policy Policy Gradient
Tosatto, Samuele
Carvalho, Joao
Abdulsamad, Hany
Peters, Jan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[10] Boosted Off-Policy Learning
London, Ben
Lu, Levi
Sandler, Ted
Joachims, Thorsten
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206

← 1 2 3 4 5 →