High Confidence Off-Policy Evaluation

被引：0

作者：

Thomas, Philip S. ^{[1
,2
]}

Theocharous, Georgios ^{[1
]}

Ghavamzadeh, Mohammad ^{[1
,3
]}

机构：

[1] Adobe Res, San Jose, CA 95110 USA

[2] Univ Massachusetts, Amherst, MA 01003 USA

[3] INRIA Lille, Lille, France

来源：

PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2015年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.

引用

页码：3000 / 3006

页数：7

共 20 条

[1]

Anderson T.W., 1969, B INT STAT I, V43, P249

[2]

Diouf M. A., 2005, IMPROVED NONPARAMETR

[3] ASYMPTOTIC MINIMAX CHARACTER OF THE SAMPLE DISTRIBUTION FUNCTION AND OF THE CLASSICAL MULTINOMIAL ESTIMATOR [J].

DVORETZKY, A ;

KIEFER, J ;

WOLFOWITZ, J .

ANNALS OF MATHEMATICAL STATISTICS, 1956, 27 (03) :642-669

[4] Planning treatment of ischemic heart disease with partially observable Markov decision processes [J].

Hauskrecht, M ;

Fraser, H .

ARTIFICIAL INTELLIGENCE IN MEDICINE, 2000, 18 (03) :221-244

[5]

Li L, 2010, P 19 INT C WORLD WID, P661, DOI DOI 10.1145/1772690.1772758

[6]

Li LJ, 2011, METAGENOMICS OF THE HUMAN BODY, P297, DOI 10.1007/978-1-4419-7089-3_14

[7]

Liu B., 2012, Advances in Neural Information Processing Systems

[8]

Maei H. R., 2010, Proceedings of the Third Conference on Artificial General Intelligence

[9]

Mandel T, 2014, AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, P1077

[10]

Massart P, 2007, LECT NOTES MATH, V1896, P1, DOI 10.1007/978-3-540-48503-2

← 1 2 →