QVDDPG: QV Learning with Balanced Constraint in Actor-Critic Framework

被引:0
作者
Huang, Jiao [1 ]
Hu, Jifeng [1 ]
Yang, Luheng [2 ]
Ren, Zhihang [3 ]
Chen, Hechang [1 ]
Yang, Bo [2 ]
机构
[1] Jilin Univ, Sch Artificial Intelligence, Changchun, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China
[3] China FAW Grp Corp, Changchun, Peoples R China
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
基金
国家重点研发计划;
关键词
deep learning; reinforcement learning;
D O I
10.1109/IJCNN54540.2023.10191805
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic framework has achieved tremendous success in a great many of decision-making scenarios. Nevertheless, when updating the value of new states and actions in the long-term scene, these methods suffer from misestimate problem and gradient variance problem, significantly reducing convergence speed and robustness of the policy. These problems severely limit the application scope of these methods. In this paper, we first proposed QVDDPG, a deep RL algorithm based on the iterative target value update process. The QV learning method alleviates the problem of misestimate by making use of the guidance of Q value and the fast convergence of V value, thus accelerating the convergence speed. In addition, the actor utilizes a constrained balanced gradient and establishes a hidden state for the continuous action space network for the sake of robustness of the model. We give the update relation among the value functions and the constraint conditions of gradient estimation. We measure our method on the PyBullet and achieved state-of-the-art performance. Moreover, we demonstrate that, our method has higher robustness and convergence speed across different tasks compared to other algorithms.
引用
收藏
页数:8
相关论文
共 35 条
[1]  
[Anonymous], 2018, INT C MACH LEARN
[2]  
[Anonymous], P AAAI C ARTIFICIAL
[3]  
[Anonymous], 2016, Openai gym
[4]  
Barth-Maron G., 2018, INT C LEARN REPR
[5]  
Chen Xiaocong, 2021, ARXIV210903540
[6]  
Duan Y, 2016, PR MACH LEARN RES, V48
[7]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[8]  
Gu S., 2017, 2017 IEEE INT C ROBO, P3389
[9]  
Haarnoja T, 2017, PR MACH LEARN RES, V70
[10]  
Herlau T, 2022, AAAI CONF ARTIF INTE, P6910