Approximate information for efficient exploration-exploitation strategies

被引：0

作者：

Barbier-Chebbah, Alex ^{[1
,2
]}

Vestergaard, Christian L. ^{[1
,2
]}

Masson, Jean-Baptiste ^{[1
,2
]}

机构：

[1] Univ Paris Cite, Inst Pasteur, CNRS UMR 3571, Decis & Bayesian Computat, F-75015 Paris, France

[2] Epimethee, Inria, F-75012 Paris, France

来源：

PHYSICAL REVIEW E | 2024年 / 109卷 / 05期

关键词：

CLINICAL-TRIALS; BANDIT MODELS; SPACE; GO;

D O I：

10.1103/PhysRevE.109.L052105

中图分类号：

O35 [流体力学]; O53 [等离子体物理学];

学科分类号：

070204 ; 080103 ; 080704 ;

摘要：

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.

引用

页数：6

共 41 条

[1] [Anonymous], 2015, Decision, DOI [DOI 10.1037/DEC0000033, 10.1037/dec0000033]
[2] [Anonymous], About us, DOI [10.1103/PhysRevE.109.L052105, DOI 10.1103/PHYSREVE.109.L052105]
[3] Finite-time analysis of the multiarmed bandit problem
Auer, P
Cesa-Bianchi, N
Fischer, P
[J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
[4] Bayati M., 2020, Advances in Neural Information Processing Systems, V33, P1713
[5] Survey on Applications of Multi-Armed and Contextual Bandits
Bouneffouf, Djallel
Rish, Irina
Aggarwal, Charu
[J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
[6] Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Bouneffouf, Djallel
Rish, Irina
Cecchi, Guillermo A.
[J]. ARTIFICIAL GENERAL INTELLIGENCE: 10TH INTERNATIONAL CONFERENCE, AGI 2017, 2017, 10414 : 237 - 248
[7] Pure exploration in finitely-armed and continuous-armed bandits
Bubeck, Sebastien
Munos, Remi
Stoltz, Gilles
[J]. THEORETICAL COMPUTER SCIENCE, 2011, 412 (19) : 1832 - 1852
[8] Navigation Along Windborne Plumes of Pheromone and Resource-Linked Odors
Carde, Ring T.
[J]. ANNUAL REVIEW OF ENTOMOLOGY, VOL 66, 2021, 2021, 66 : 317 - 336
[9] Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration
Cohen, Jonathan D.
McClure, Samuel M.
Yu, Angela J.
[J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2007, 362 (1481) : 933 - 942
[10] Interactive Anomaly Detection on Attributed Networks
Ding, Kaize
Li, Jundong
Liu, Huan
[J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 357 - 365

← 1 2 3 4 5 →