NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引：0

作者：

Paczolay, Gabor ^{[1
]}

Harmati, Istvan ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary

来源：

ACTA POLYTECHNICA HUNGARICA | 2024年 / 21卷 / 11期

关键词：

reinforcement learning; DQN; NPV; NPV-DQN;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.

引用

页码：175 / 190

页数：16

共 39 条

[21] Variable Speed Limit Control for Mixed Traffic Flow on Highways Based on Deep-Reinforcement Learning
Gao, Heyao
Jia, Hongfei
Wu, Ruiyi
Huang, Qiuyang
Tian, Jingjing
Liu, Chao
Wang, Xiaochao
[J]. JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2024, 150 (03)
[22] Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning
Xu X.-G.
Xia Y.-J.
Zhu S.-Y.
Kuang L.
[J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (05): : 987 - 994and1005
[23] Reinforcement Learning-Based Adaptive Optimal Fuzzy MPPT Control for Variable Speed Wind Turbine
Nga Thi-Thuy Vu
Ha Duc Nguyen
Anh Tuan Nguyen
[J]. IEEE ACCESS, 2022, 10 : 95771 - 95780
[24] RCP: A Reinforcement Learning-Based Retransmission Control Protocol for Delivery and Latency Sensitive Applications
Wang, Yu
Abouzeid, Alhussein A.
[J]. 30TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2021), 2021,
[25] Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy
van Rooijen, J. C.
Grondman, I.
Babuska, R.
[J]. MECHATRONICS, 2014, 24 (08) : 966 - 974
[26] Online Reinforcement Learning Control of Nonlinear Dynamic Systems: A State-action Value Function Based Solution
Asl, Hamed Jabbari
Uchibe, Eiji
[J]. NEUROCOMPUTING, 2023, 544
[27] Variable-parameter MPC Multi-objective Control for Intelligent Vehicle Path Tracking Based on Reinforcement Learning
Wang H.-B.
Wang C.-Y.
Zhao L.-F.
Hu Y.-P.
[J]. Zhongguo Gonglu Xuebao/China Journal of Highway and Transport, 2024, 37 (03): : 157 - 169
[28] Reinforcement learning based variable damping control of wearable robotic limbs for maintaining astronaut pose during extravehicular activity
Zhao, Sikai
Zheng, Tianjiao
Sui, Dongbao
Zhao, Jie
Zhu, Yanhe
[J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
[29] Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet
Liu, Chen
Dong, Chaoyang
Zhou, Zhijie
Wang, Zhaolei
[J]. AEROSPACE SCIENCE AND TECHNOLOGY, 2020, 96
[30] Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks
Li, Zhibin
Liu, Pan
Xu, Chengcheng
Duan, Hui
Wang, Wei
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2017, 18 (11) : 3204 - 3217

← 1 2 3 4 →