A novel action decision method of deep reinforcement learning based on a neural network and confidence bound

被引:2
作者
Zhang, Wenhao [1 ]
Song, Yaqing [1 ]
Liu, Xiangpeng [1 ]
Shangguan, Qianqian [1 ]
An, Kang [1 ]
机构
[1] Shanghai Normal Univ, Coll Informat Mech & Elect Engn, Shanghai 201418, Peoples R China
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
UCB; Exploration and exploitation; Deep reinforcement learning; Machine learning;
D O I
10.1007/s10489-023-04695-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From the perspective of the deep reinforcement learning algorithm, the training effect of the agent will be affected because of the excessive randomness of the e-greedy method. This paper proposes a novel action decision method to replace the e-greedy method and avoid excessive randomness. First, a confidence bound span fitting model based on a deep neural network is proposed to fundamentally solve the problem that UCB cannot estimate the confidence bound span of each action in high-dimensional state space. Then, a confidence bound span balance model based on target value in reverse order is proposed. The parameters of the U network are updated after each action decision using the backpropagation of the neural network to balance the confidence bound span. Finally, an exploration-exploitation dynamic balance factor a is introduced to balance exploration and exploitation in the training process. Experiments are conducted using the Nature DQN and Double DQN algorithms, and the results demonstrate that the proposed method achieves higher performance than the e-greedy method under the basic algorithm and experimental environment of this paper. The method presented in this paper has significance for applying a confidence bound to solve complex reinforcement problems.
引用
收藏
页码:21299 / 21311
页数:13
相关论文
共 37 条
  • [1] Identification of Top-K Influencers Based on Upper Confidence Bound and Local Structure
    Alshahrani, Mohammed
    Zhu, Fuxi
    Mekouar, Soufiana
    Alghamdi, Mohammed Yahya
    Liu, Shichao
    [J]. BIG DATA RESEARCH, 2021, 25
  • [2] Beyer L., 2019, ARXIV
  • [3] Hierarchical learning from human preferences and curiosity
    Bougie, Nicolas
    Ichise, Ryutaro
    [J]. APPLIED INTELLIGENCE, 2022, 52 (07) : 7459 - 7479
  • [4] Fast and slow curiosity for high-level exploration in reinforcement learning
    Bougie, Nicolas
    Ichise, Ryutaro
    [J]. APPLIED INTELLIGENCE, 2021, 51 (02) : 1086 - 1107
  • [5] Colas C, 2018, PR MACH LEARN RES, V80
  • [6] De Ath G., 2021, ACM Transact. Evolut. Learn. Optimiz, V1, P1
  • [7] gymlibrary, Gym documentation
  • [8] MULTILAYER FEEDFORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS
    HORNIK, K
    STINCHCOMBE, M
    WHITE, H
    [J]. NEURAL NETWORKS, 1989, 2 (05) : 359 - 366
  • [9] Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning
    Hua, Jiang
    Zeng, Liangcai
    Li, Gongfa
    Ju, Zhaojie
    [J]. SENSORS, 2021, 21 (04) : 1 - 21
  • [10] Machine Learning and Deep Learning in smart manufacturing: The Smart Grid paradigm
    Kotsiopoulos, Thanasis
    Sarigiannidis, Panagiotis
    Ioannidis, Dimosthenis
    Tzovaras, Dimitrios
    [J]. COMPUTER SCIENCE REVIEW, 2021, 40