Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits

被引:0
|
作者
Wang, Chi [1 ]
Shi, Lin [1 ]
Luo, Junru [1 ]
机构
[1] Changzhou Univ, Sch Comp Sci & Artificial Intelligence, Changzhou 213000, Peoples R China
关键词
multi-armed bandits; exploration and exploitation; adaptive noise exploration;
D O I
10.3390/a18020056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In contextual multi-armed bandits, the relationship between contextual information and rewards is typically unknown, complicating the trade-off between exploration and exploitation. A common approach to address this challenge is the Upper Confidence Bound (UCB) method, which constructs confidence intervals to guide exploration. However, the UCB method becomes computationally expensive in environments with numerous arms and dynamic contexts. This paper presents an adaptive noise exploration framework to reduce computational complexity and introduces two novel algorithms: EAD (Exploring Adaptive Noise in Decision-Making Processes) and EAP (Exploring Adaptive Noise in Parameter Spaces). EAD injects adaptive noise into the reward signals based on arm selection frequency, while EAP adds adaptive noise to the hidden layer of the neural network for more stable exploration. Experimental results on recommendation and classification tasks show that both algorithms significantly surpass traditional linear and neural methods in computational efficiency and overall performance.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] PAC models in stochastic multi-objective multi-armed bandits
    Drugan, Madalina M.
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 409 - 416
  • [32] Multi-User Multi-Armed Bandits for Uncoordinated Spectrum Access
    Bande, Meghana
    Veeravalli, Venugopal V.
    2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, : 653 - 657
  • [33] Multi-armed Bandits with Generalized Temporally-Partitioned Rewards
    van den Broek, Ronald C.
    Litjens, Rik
    Sagis, Tobias
    Verbeeke, Nina
    Gajane, Pratik
    ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024, 2024, 14641 : 41 - 52
  • [34] Block pruning residual networks using Multi-Armed Bandits
    Benatia, Mohamed Akrem
    Amara, Yacine
    Boulahia, Said Yacine
    Hocini, Abdelouahab
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2023,
  • [35] Robust Risk-Averse Stochastic Multi-armed Bandits
    Maillard, Odalric-Ambrym
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 218 - 233
  • [36] Best-Arm Identification in Correlated Multi-Armed Bandits
    Gupta S.
    Joshi G.
    Yagan O.
    IEEE Journal on Selected Areas in Information Theory, 2021, 2 (02): : 549 - 563
  • [37] Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems
    Ravi, Aditya Narayan
    Poduval, Pranav
    Moharir, Sharayu
    2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,
  • [38] PAC-Bayesian lifelong learning for multi-armed bandits
    Flynn, Hamish
    Reeb, David
    Kandemir, Melih
    Peters, Jan
    DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (02) : 841 - 876
  • [39] Human-AI Learning Performance in Multi-Armed Bandits
    Pandya, Ravi
    Huang, Sandy H.
    Hadfield-Menell, Dylan
    Dragan, Anca D.
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 369 - 375
  • [40] The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits
    Loecher, Markus
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4