Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits

被引:0
|
作者
Wang, Chi [1 ]
Shi, Lin [1 ]
Luo, Junru [1 ]
机构
[1] Changzhou Univ, Sch Comp Sci & Artificial Intelligence, Changzhou 213000, Peoples R China
关键词
multi-armed bandits; exploration and exploitation; adaptive noise exploration;
D O I
10.3390/a18020056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In contextual multi-armed bandits, the relationship between contextual information and rewards is typically unknown, complicating the trade-off between exploration and exploitation. A common approach to address this challenge is the Upper Confidence Bound (UCB) method, which constructs confidence intervals to guide exploration. However, the UCB method becomes computationally expensive in environments with numerous arms and dynamic contexts. This paper presents an adaptive noise exploration framework to reduce computational complexity and introduces two novel algorithms: EAD (Exploring Adaptive Noise in Decision-Making Processes) and EAP (Exploring Adaptive Noise in Parameter Spaces). EAD injects adaptive noise into the reward signals based on arm selection frequency, while EAP adds adaptive noise to the hidden layer of the neural network for more stable exploration. Experimental results on recommendation and classification tasks show that both algorithms significantly surpass traditional linear and neural methods in computational efficiency and overall performance.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Multi-Armed Bandits for Many-Task Evolutionary Optimization
    Le Tien Thanh
    Le Van Cuong
    Ta Bao Thang
    Huynh Thi Thanh Binh
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 1664 - 1671
  • [42] Multi-Armed Bandits on Partially Revealed Unit Interval Graphs
    Xu, Xiao
    Vakili, Sattar
    Zhao, Qing
    Swami, Ananthram
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (03): : 1453 - 1465
  • [43] NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK
    Alon, Noga
    Cesa-Bianchi, Nicolo
    Gentile, Claudio
    Mannor, Shie
    Mansour, Yishay
    Shamir, Ohad
    SIAM JOURNAL ON COMPUTING, 2017, 46 (06) : 1785 - 1826
  • [44] Minimax Off-Policy Evaluation for Multi-Armed Bandits
    Ma, Cong
    Zhu, Banghua
    Jiao, Jiantao
    Wainwright, Martin J.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5314 - 5339
  • [45] Multi-armed bandits for decentralized AP selection in enterprise WLANs
    Carrascosa, Marc
    Bellalta, Boris
    COMPUTER COMMUNICATIONS, 2020, 159 : 108 - 123
  • [46] On the Bias, Risk, and Consistency of Sample Means in Multi-armed Bandits
    Shin, Jaehyeok
    Ramdas, Aaditya
    Rinaldo, Alessandro
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (04): : 1278 - 1300
  • [47] Thompson sampling for multi-armed bandits in big data environments
    Kim, Min Kyong
    Hwang, Beom Seuk
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)
  • [48] Network Defense Resource Allocation Scheme with Multi-armed Bandits
    Huang, Ning
    Feng, Xue-cai
    Zhang, Rui
    Yang, Xiu-gui
    Xia, Hui
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2022), PT I, 2022, 13471 : 326 - 337
  • [49] Sample complexity of partition identification using multi-armed bandits
    Juneja, Sandeep
    Krishnasamy, Subhashini
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [50] Optimal Learning Policies for Differential Privacy in Multi-armed Bandits
    Wang, Siwei
    Zhu, Jun
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25