NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引:0
|
作者
Paczolay, Gabor [1 ]
Harmati, Istvan [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary
关键词
reinforcement learning; DQN; NPV; NPV-DQN;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 39 条
  • [1] Reinforcement Learning for value-based Placement of Fog Services
    Poltronieri, Filippo
    Tortonesi, Mauro
    Stefanelli, Cesare
    Suri, Niranjan
    2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 466 - 472
  • [2] Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning
    Sharma, Abhinav
    Gupta, Ruchir
    Lakshmanan, K.
    Gupta, Atul
    SYMMETRY-BASEL, 2021, 13 (07):
  • [3] A multi process value-based reinforcement learning environment framework for adaptive traffic signal control
    Cao, Jie
    Huang, Dailin
    Hou, Liang
    Ma, Jialin
    JOURNAL OF CONTROL AND DECISION, 2023, 10 (02) : 229 - 236
  • [4] Advances in Value-based, Policy-based, and Deep Learning-based Reinforcement Learning
    Byeon, Haewon
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 348 - 354
  • [5] Learning variable impedance control based on reinforcement learning
    Li C.
    Zhang Z.
    Xia G.
    Xie X.
    Zhu Q.
    Liu Q.
    Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2019, 40 (02): : 304 - 311
  • [6] Applying Value-Based Deep Reinforcement Learning on KPI Time Series Anomaly Detection
    Zhang, Yu
    Wang, Tianbo
    2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, : 197 - 202
  • [7] Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
    Borzilov, Anatolii
    Skrynnik, Alexey
    Panov, Aleksandr
    IEEE ACCESS, 2025, 13 : 13770 - 13781
  • [8] Variable Sampling Period Adaptive Control Based on Reinforcement Learning
    Lemos, Joao M.
    Parente, Francisco
    Cunha, Rita
    CONTROLO 2022, 2022, 930 : 577 - 586
  • [9] Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond
    Morita, Kenji
    Jitsev, Jenia
    Morrison, Abigail
    BEHAVIOURAL BRAIN RESEARCH, 2016, 311 : 110 - 121
  • [10] Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning
    Chaukiyal, Arunita
    TELECOMMUNICATION SYSTEMS, 2024, 87 (03) : 575 - 591