THE CONTINUUM-ARMED BANDIT PROBLEM

被引：114

作者：

AGRAWAL, R

机构：

来源：

SIAM JOURNAL ON CONTROL AND OPTIMIZATION | 1995年 / 33卷 / 06期

关键词：

BANDIT PROBLEMS; CONTROLLED IID PROCESS; STOCHASTIC ADAPTIVE CONTROL; CERTAINTY EQUIVALENCE WITH FORCING; LEARNING LOSS; CONTINUOUS ARMS;

D O I：

10.1137/S0363012992237273

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the o(n) required for optimality with respect to the average-cost-per-unit-time criterion.

引用

页码：1926 / 1951

页数：26

共 28 条

[1] CERTAINTY EQUIVALENCE CONTROL WITH FORCING - REVISITED [J].