Approximate information for efficient exploration-exploitation strategies

被引:0
作者
Barbier-Chebbah, Alex [1 ,2 ]
Vestergaard, Christian L. [1 ,2 ]
Masson, Jean-Baptiste [1 ,2 ]
机构
[1] Univ Paris Cite, Inst Pasteur, CNRS UMR 3571, Decis & Bayesian Computat, F-75015 Paris, France
[2] Epimethee, Inria, F-75012 Paris, France
关键词
CLINICAL-TRIALS; BANDIT MODELS; SPACE; GO;
D O I
10.1103/PhysRevE.109.L052105
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.
引用
收藏
页数:6
相关论文
共 41 条
  • [31] Mastering the game of Go with deep neural networks and tree search
    Silver, David
    Huang, Aja
    Maddison, Chris J.
    Guez, Arthur
    Sifre, Laurent
    van den Driessche, George
    Schrittwieser, Julian
    Antonoglou, Ioannis
    Panneershelvam, Veda
    Lanctot, Marc
    Dieleman, Sander
    Grewe, Dominik
    Nham, John
    Kalchbrenner, Nal
    Sutskever, Ilya
    Lillicrap, Timothy
    Leach, Madeleine
    Kavukcuoglu, Koray
    Graepel, Thore
    Hassabis, Demis
    [J]. NATURE, 2016, 529 (7587) : 484 - +
  • [32] Introduction to Multi-Armed Bandits Preface
    Slivkins, Aleksandrs
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286
  • [33] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
  • [34] Behavioral Variability through Stochastic Choice and Its Gating by Anterior Cingulate Cortex
    Tervo, Dougal G. R.
    Proskurin, Mikhail
    Manakov, Maxim
    Kabra, Mayank
    Vollmer, Alison
    Branson, Kristin
    Karpova, Alla Y.
    [J]. CELL, 2014, 159 (01) : 21 - 32
  • [35] Thompson WR, 1933, BIOMETRIKA, V25, P285, DOI 10.2307/2332286
  • [36] 'Infotaxis' as a strategy for searching without gradients
    Vergassola, Massimo
    Villermaux, Emmanuel
    Shraiman, Boris I.
    [J]. NATURE, 2007, 445 (7126) : 406 - 409
  • [37] BANDIT STRATEGIES EVALUATED IN THE CONTEXT OF CLINICAL TRIALS IN RARE LIFE-THREATENING DISEASES
    Villar, Sofia S.
    [J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2018, 32 (02) : 229 - 245
  • [38] Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges
    Villar, Sofia S.
    Bowden, Jack
    Wason, James
    [J]. STATISTICAL SCIENCE, 2015, 30 (02) : 199 - 215
  • [39] Balancing exploration and exploitation with information and randomization
    Wilson, Robert C.
    Bonawitz, Elizabeth
    Costa, Vincent D.
    Ebitz, R. Becket
    [J]. CURRENT OPINION IN BEHAVIORAL SCIENCES, 2021, 38 : 49 - 56
  • [40] Multi-robot searching with sparse binary cues and limited space perception
    Zhang, Siqi
    Martinez, Dominique
    Masson, Jean -Baptiste
    [J]. FRONTIERS IN ROBOTICS AND AI, 2015,