High Confidence Off-Policy Evaluation

被引:0
作者
Thomas, Philip S. [1 ,2 ]
Theocharous, Georgios [1 ]
Ghavamzadeh, Mohammad [1 ,3 ]
机构
[1] Adobe Res, San Jose, CA 95110 USA
[2] Univ Massachusetts, Amherst, MA 01003 USA
[3] INRIA Lille, Lille, France
来源
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2015年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.
引用
收藏
页码:3000 / 3006
页数:7
相关论文
共 20 条
[1]  
Anderson T.W., 1969, B INT STAT I, V43, P249
[2]  
Diouf M. A., 2005, IMPROVED NONPARAMETR
[3]   ASYMPTOTIC MINIMAX CHARACTER OF THE SAMPLE DISTRIBUTION FUNCTION AND OF THE CLASSICAL MULTINOMIAL ESTIMATOR [J].
DVORETZKY, A ;
KIEFER, J ;
WOLFOWITZ, J .
ANNALS OF MATHEMATICAL STATISTICS, 1956, 27 (03) :642-669
[4]   Planning treatment of ischemic heart disease with partially observable Markov decision processes [J].
Hauskrecht, M ;
Fraser, H .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2000, 18 (03) :221-244
[5]  
Li L, 2010, P 19 INT C WORLD WID, P661, DOI DOI 10.1145/1772690.1772758
[6]  
Li LJ, 2011, METAGENOMICS OF THE HUMAN BODY, P297, DOI 10.1007/978-1-4419-7089-3_14
[7]  
Liu B., 2012, Advances in Neural Information Processing Systems
[8]  
Maei H. R., 2010, Proceedings of the Third Conference on Artificial General Intelligence
[9]  
Mandel T, 2014, AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, P1077
[10]  
Massart P, 2007, LECT NOTES MATH, V1896, P1, DOI 10.1007/978-3-540-48503-2