THE CONTINUUM-ARMED BANDIT PROBLEM

被引:114
作者
AGRAWAL, R
机构
关键词
BANDIT PROBLEMS; CONTROLLED IID PROCESS; STOCHASTIC ADAPTIVE CONTROL; CERTAINTY EQUIVALENCE WITH FORCING; LEARNING LOSS; CONTINUOUS ARMS;
D O I
10.1137/S0363012992237273
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the o(n) required for optimality with respect to the average-cost-per-unit-time criterion.
引用
收藏
页码:1926 / 1951
页数:26
相关论文
共 28 条
[1]   CERTAINTY EQUIVALENCE CONTROL WITH FORCING - REVISITED [J].
AGRAWAL, R ;
TENEKETZIS, D .
SYSTEMS & CONTROL LETTERS, 1989, 13 (05) :405-412
[2]   ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION RULES FOR THE MULTIARMED BANDIT PROBLEM WITH SWITCHING COST [J].
AGRAWAL, R ;
HEGDE, MV ;
TENEKETZIS, D .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1988, 33 (10) :899-905
[3]   ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION SCHEMES FOR CONTROLLED IID PROCESSES - FINITE PARAMETER SPACE [J].
AGRAWAL, R ;
TENEKETZIS, D ;
ANANTHARAM, V .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1989, 34 (03) :258-267
[4]  
Agrawal R., 1990, Stochastics and Stochastics Reports, V29, P437, DOI 10.1080/17442509008833627
[5]  
AGRAWAL R, 1992, ADAPTIVE CONTROL IID
[6]   ASYMPTOTICALLY EFFICIENT ALLOCATION RULES FOR THE MULTIARMED BANDIT PROBLEM WITH MULTIPLE PLAYS .1. IID REWARDS [J].
ANANTHARAM, V ;
VARAIYA, P ;
WALRAND, J .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1987, 32 (11) :968-976
[7]  
ANANTHARAM V, 1987, IEEE T AUTOMAT CONTR, V32, P975
[8]  
[Anonymous], 1973, STOCHASTIC APPROXIMA
[9]  
[Anonymous], 1978, STOCHASTIC APPROXIMA
[10]   DIFFUSION FOR GLOBAL OPTIMIZATION IN RN [J].
CHIANG, TS ;
HWANG, CR ;
SHEU, SJ .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1987, 25 (03) :737-753