NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引：0

作者：

Paczolay, Gabor ^{[1
]}

Harmati, Istvan ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary

来源：

ACTA POLYTECHNICA HUNGARICA | 2024年 / 21卷 / 11期

关键词：

reinforcement learning; DQN; NPV; NPV-DQN;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.

引用

页码：175 / 190

页数：16

共 39 条

[1] Reinforcement Learning for value-based Placement of Fog Services
Poltronieri, Filippo
Tortonesi, Mauro
Stefanelli, Cesare
Suri, Niranjan
2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 466 - 472
[2] Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning
Sharma, Abhinav
Gupta, Ruchir
Lakshmanan, K.
Gupta, Atul
SYMMETRY-BASEL, 2021, 13 (07):
[3] A multi process value-based reinforcement learning environment framework for adaptive traffic signal control
Cao, Jie
Huang, Dailin
Hou, Liang
Ma, Jialin
JOURNAL OF CONTROL AND DECISION, 2023, 10 (02) : 229 - 236
[4] Advances in Value-based, Policy-based, and Deep Learning-based Reinforcement Learning
Byeon, Haewon
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 348 - 354
[5] Learning variable impedance control based on reinforcement learning
Li C.
Zhang Z.
Xia G.
Xie X.
Zhu Q.
Liu Q.
Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2019, 40 (02): : 304 - 311
[6] Applying Value-Based Deep Reinforcement Learning on KPI Time Series Anomaly Detection
Zhang, Yu
Wang, Tianbo
2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, : 197 - 202
[7] Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
Borzilov, Anatolii
Skrynnik, Alexey
Panov, Aleksandr
IEEE ACCESS, 2025, 13 : 13770 - 13781
[8] Variable Sampling Period Adaptive Control Based on Reinforcement Learning
Lemos, Joao M.
Parente, Francisco
Cunha, Rita
CONTROLO 2022, 2022, 930 : 577 - 586
[9] Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond
Morita, Kenji
Jitsev, Jenia
Morrison, Abigail
BEHAVIOURAL BRAIN RESEARCH, 2016, 311 : 110 - 121
[10] Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning
Chaukiyal, Arunita
TELECOMMUNICATION SYSTEMS, 2024, 87 (03) : 575 - 591

← 1 2 3 4 →