Approximate information for efficient exploration-exploitation strategies

被引:0
作者
Barbier-Chebbah, Alex [1 ,2 ]
Vestergaard, Christian L. [1 ,2 ]
Masson, Jean-Baptiste [1 ,2 ]
机构
[1] Univ Paris Cite, Inst Pasteur, CNRS UMR 3571, Decis & Bayesian Computat, F-75015 Paris, France
[2] Epimethee, Inria, F-75012 Paris, France
关键词
CLINICAL-TRIALS; BANDIT MODELS; SPACE; GO;
D O I
10.1103/PhysRevE.109.L052105
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.
引用
收藏
页数:6
相关论文
共 41 条
  • [1] [Anonymous], 2015, Decision, DOI [DOI 10.1037/DEC0000033, 10.1037/dec0000033]
  • [2] [Anonymous], About us, DOI [10.1103/PhysRevE.109.L052105, DOI 10.1103/PHYSREVE.109.L052105]
  • [3] Finite-time analysis of the multiarmed bandit problem
    Auer, P
    Cesa-Bianchi, N
    Fischer, P
    [J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
  • [4] Bayati M., 2020, Advances in Neural Information Processing Systems, V33, P1713
  • [5] Survey on Applications of Multi-Armed and Contextual Bandits
    Bouneffouf, Djallel
    Rish, Irina
    Aggarwal, Charu
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [6] Bandit Models of Human Behavior: Reward Processing in Mental Disorders
    Bouneffouf, Djallel
    Rish, Irina
    Cecchi, Guillermo A.
    [J]. ARTIFICIAL GENERAL INTELLIGENCE: 10TH INTERNATIONAL CONFERENCE, AGI 2017, 2017, 10414 : 237 - 248
  • [7] Pure exploration in finitely-armed and continuous-armed bandits
    Bubeck, Sebastien
    Munos, Remi
    Stoltz, Gilles
    [J]. THEORETICAL COMPUTER SCIENCE, 2011, 412 (19) : 1832 - 1852
  • [8] Navigation Along Windborne Plumes of Pheromone and Resource-Linked Odors
    Carde, Ring T.
    [J]. ANNUAL REVIEW OF ENTOMOLOGY, VOL 66, 2021, 2021, 66 : 317 - 336
  • [9] Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration
    Cohen, Jonathan D.
    McClure, Samuel M.
    Yu, Angela J.
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2007, 362 (1481) : 933 - 942
  • [10] Interactive Anomaly Detection on Attributed Networks
    Ding, Kaize
    Li, Jundong
    Liu, Huan
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 357 - 365