NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引:0
作者
Paczolay, Gabor [1 ]
Harmati, Istvan [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary
关键词
reinforcement learning; DQN; NPV; NPV-DQN;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 39 条
  • [21] Variable Speed Limit Control for Mixed Traffic Flow on Highways Based on Deep-Reinforcement Learning
    Gao, Heyao
    Jia, Hongfei
    Wu, Ruiyi
    Huang, Qiuyang
    Tian, Jingjing
    Liu, Chao
    Wang, Xiaochao
    [J]. JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2024, 150 (03)
  • [22] Cooperative control algorithm of multi-intersection variable-direction lanes based on reinforcement learning
    Xu X.-G.
    Xia Y.-J.
    Zhu S.-Y.
    Kuang L.
    [J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (05): : 987 - 994and1005
  • [23] Reinforcement Learning-Based Adaptive Optimal Fuzzy MPPT Control for Variable Speed Wind Turbine
    Nga Thi-Thuy Vu
    Ha Duc Nguyen
    Anh Tuan Nguyen
    [J]. IEEE ACCESS, 2022, 10 : 95771 - 95780
  • [24] RCP: A Reinforcement Learning-Based Retransmission Control Protocol for Delivery and Latency Sensitive Applications
    Wang, Yu
    Abouzeid, Alhussein A.
    [J]. 30TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2021), 2021,
  • [25] Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy
    van Rooijen, J. C.
    Grondman, I.
    Babuska, R.
    [J]. MECHATRONICS, 2014, 24 (08) : 966 - 974
  • [26] Online Reinforcement Learning Control of Nonlinear Dynamic Systems: A State-action Value Function Based Solution
    Asl, Hamed Jabbari
    Uchibe, Eiji
    [J]. NEUROCOMPUTING, 2023, 544
  • [27] Variable-parameter MPC Multi-objective Control for Intelligent Vehicle Path Tracking Based on Reinforcement Learning
    Wang H.-B.
    Wang C.-Y.
    Zhao L.-F.
    Hu Y.-P.
    [J]. Zhongguo Gonglu Xuebao/China Journal of Highway and Transport, 2024, 37 (03): : 157 - 169
  • [28] Reinforcement learning based variable damping control of wearable robotic limbs for maintaining astronaut pose during extravehicular activity
    Zhao, Sikai
    Zheng, Tianjiao
    Sui, Dongbao
    Zhao, Jie
    Zhu, Yanhe
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [29] Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet
    Liu, Chen
    Dong, Chaoyang
    Zhou, Zhijie
    Wang, Zhaolei
    [J]. AEROSPACE SCIENCE AND TECHNOLOGY, 2020, 96
  • [30] Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks
    Li, Zhibin
    Liu, Pan
    Xu, Chengcheng
    Duan, Hui
    Wang, Wei
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2017, 18 (11) : 3204 - 3217