STOCHASTIC CONVEX OPTIMIZATION WITH BANDIT FEEDBACK

被引:39
作者
Agarwal, Alekh [1 ]
Foster, Dean P. [2 ]
Hsu, Daniel [3 ]
Kakade, Sham M. [3 ]
Rakhlin, Alexander [2 ]
机构
[1] Microsoft Res, New York, NY 10016 USA
[2] Univ Penn, Dept Stat, Philadelphia, PA 19104 USA
[3] Microsoft Res, Cambridge, MA 02142 USA
基金
美国国家科学基金会;
关键词
derivative-free optimization; bandit optimization; ellipsoid method;
D O I
10.1137/110850827
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x is an element of X. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs (O) over tilde (poly(d)root T) regret. Since any algorithm has regret at least Omega(root T) on this problem, our algorithm is optimal in terms of the scaling with T.
引用
收藏
页码:213 / 240
页数:28
相关论文
共 20 条
  • [1] AGARWAL A., 2010, P COLT
  • [2] THE CONTINUUM-ARMED BANDIT PROBLEM
    AGRAWAL, R
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1995, 33 (06) : 1926 - 1951
  • [3] Improved rates for the stochastic continuum-armed bandit problem
    Auer, Peter
    Ortner, Ronald
    Szepesvari, Csaba
    [J]. LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 454 - +
  • [4] Solving convex programs by random walks
    Bertsimas, D
    Vempala, S
    [J]. JOURNAL OF THE ACM, 2004, 51 (04) : 540 - 556
  • [5] Bubeck S, 2011, J MACH LEARN RES, V12, P1655
  • [6] Buldygin V. V., 1980, Ukr. Math. J, V32, P483, DOI [DOI 10.1007/BF01087176, 10.1007/BF01087176]
  • [7] Conn A. R., 2009, Introduction to Derivative-Free Optimization
  • [8] Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems
    Cope, Eric W.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2009, 54 (06) : 1243 - 1253
  • [9] DANI V., 2008, P 21 ANN C LEARN THE
  • [10] Flaxman AD, 2005, PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P385