Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior

被引：2

作者：

Carden, Stephen W. ^{[1
]}

Walker, S. Dalton ^{[2
]}

机构：

[1] Georgia Southern Univ, Dept Math Sci, Statesboro, GA 30460 USA

[2] Air Force Mat Command, Robins Air Force Base, GA 31098 USA

来源：

MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2019年 / 1卷 / 02期

关键词：

count-based exploration; without-replacement sampling; stochastic shortest path; reinforcement learning; Markov decision processes; CONVERGENCE;

D O I：

10.3390/make1020041

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with-replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority.

引用

页码：698 / 714

页数：17

共 31 条

[1] [Anonymous], 1992, Adventures in Stochastic Processes
[2] [Anonymous], 2017, P 26 INT JOINT C ART
[3] [Anonymous], 2005, MARKOV DECISION PROC
[4] [Anonymous], 2017, NIPS
[5] Auer P., 2003, Journal of Machine Learning Research, V3, P397, DOI 10.1162/153244303321897663
[6] AN ANALYSIS OF STOCHASTIC SHORTEST-PATH PROBLEMS
BERTSEKAS, DP
TSITSIKLIS, JN
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 1991, 16 (03) : 580 - 595
[7] Biler P., 1990, Problems in Mathematical Analysis
[8] Machine learning with data dependent hypothesis classes
Cannon, A
Ettinger, JM
Hush, D
Scovel, C
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 335 - 358
[9] Convergence of a Q-learning Variant for Continuous States and Actions
Carden, Stephen
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 49 : 705 - 731
[10] Feng DY, 2012, IEEE INT WORKSH MULT, P325, DOI 10.1109/MMSP.2012.6343463

← 1 2 3 4 →