The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

被引:0
|
作者
Bayati, Mohsen [1 ]
Hamidi, Nima [1 ]
Johari, Ramesh [1 ]
Khosravi, Khashayar [2 ]
机构
[1] Stanford Univ, Stanford, CA USA
[2] Google Res NYC, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
ALLOCATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the structure of regret-minimizing policies in the many-armed Bayesian multi-armed bandit problem: in particular, with k the number of arms and T the time horizon, we consider the case where >= root T. We first show that subsampling is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples Theta(root T) arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms. In particular, in numerical experiments SS-UCB performs worse than a simple greedy algorithm (and its subsampled version) that pulls the current empirical best arm at every time period. We show that these insights hold even in a contextual setting, using real-world data. These empirical results suggest a novel form of free exploration in the many-armed regime that benefits greedy algorithms. We theoretically study this new source of free exploration and find that it is deeply connected to the distribution of a certain tail event for the prior distribution of arm rewards. This is a fundamentally distinct phenomenon from free exploration as discussed in the recent literature on contextual bandits, where free exploration arises due to variation in contexts. We use this insight to prove that the subsampled greedy algorithm is rate-optimal for Bernoulli bandits when k > root T, and achieves sublinear regret with more general distributions. This is a case where theoretical rate optimality does not tell the whole story: when complemented by the empirical observations of our paper, the power of greedy algorithms becomes quite evident. Taken together, from a practical standpoint, our results suggest that in applications it may be preferable to use a variant of the greedy algorithm in the many-armed regime.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
    Perchet, Vianney
    Rigollet, Philippe
    ANNALS OF STATISTICS, 2013, 41 (02): : 693 - 721
  • [42] The Multi-fidelity Multi-armed Bandit
    Kandasamy, Kirthevasan
    Dasarathy, Gautam
    Schneider, Jeff
    Poczos, Barnabas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [43] Multi-armed Bandit with Additional Observations
    Yun D.
    Ahn S.
    Proutiere A.
    Shin J.
    Yi Y.
    2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (46): : 53 - 55
  • [44] Greedy Confidence Bound Techniques for Restless Multi-armed Bandit Based Cognitive Radio
    Dong, Shuyan
    Lee, Jungwoo
    2013 47TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2013,
  • [45] COMBINATORIAL MULTI-ARMED BANDIT PROBLEM WITH PROBABILISTICALLY TRIGGERED ARMS: A CASE WITH BOUNDED REGRET
    Saritac, A. Omer
    Tekin, Cem
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 111 - 115
  • [46] PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
    Chaudhuri, Arghya Roy
    Kalyanakrishnan, Shivaram
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [47] Dynamic Pricing under Binary Demand Uncertainty: A Multi-Armed Bandit with Correlated Arms
    Zhai, Yixuan
    Tehrani, Pouya
    Li, Lin
    Zhao, Jiang
    Zhao, Qing
    2011 CONFERENCE RECORD OF THE FORTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS (ASILOMAR), 2011, : 1597 - 1601
  • [48] Gorthaur : A Portfolio Approach for Dynamic Selection of Multi-Armed Bandit Algorithms for Recommendation
    Gutowski, Nicolas
    Amghar, Tassadit
    Camp, Olivier
    Chhel, Fabien
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1164 - 1171
  • [49] Multi-Armed Bandit Algorithms for Crowdsourcing Systems with Online Estimation of Workers' Ability
    Rangi, Anshuka
    Franceschetti, Massimo
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 1345 - 1352
  • [50] Enhancing Evolutionary Conversion Rate Optimization via Multi-Armed Bandit Algorithms
    Qiu, Xin
    Miikkulainen, Risto
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9581 - 9588