TPN:Triple network algorithm for deep reinforcement learning

被引：1

作者：

Han, Chen ^{[1
]}

Wang, Xuanyin ^{[1
]}

机构：

[1] Zhejiang Univ, Sch Mech Engn, Yuhangtang Rd 388, Hangzhou 310063, Zhejiang, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 591卷

关键词：

TPN; Deep reinforcement learning; Target net method; GAME; GO;

D O I：

10.1016/j.neucom.2024.127755

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The target net method has been the foundation of deep reinforcement learning since Deepmind first proposed it in 2015. Almost all the current popular reinforcement learning algorithms include target net. However, while the slowly updated target network improves the stability of the algorithm, it also reduces the performance of the algorithm. In this paper, the authors design a novel triple-network algorithm(TPN). TPN combines the temporal-difference(TD) algorithm and policy gradient(PG) theorem. Using three networks to estimate the state value( u ), action value ( q ) , and policy( r ). These networks have no primary or secondary distinction but are trained synchronously and influence each other. The author found that through this TPN architecture, the convergence and stability of the algorithm can be greatly improved without increasing the amount of calculation. Although it is only a basic framework at present. The calculation process of TPN is simple and easy to implement. Experiments prove that the convergence speed and stability of TPN in discrete cases are better than PPO.

引用

页数：10

共 50 条

[1] Wireless Virtual Network Embedding Algorithm Based on Deep Reinforcement Learning
Gao, Qi
Lyu, Na
Miao, Jingcheng
Pan, Wu
ELECTRONICS, 2022, 11 (14)
[2] Fast Reconfiguration of Distribution Network Based on Deep Reinforcement Learning Algorithm
Zhao, Bincheng
Han, Xueshan
Ma, Yiran
Li, Zhiqi
2020 5TH INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, ENERGY TECHNOLOGY AND ENVIRONMENTAL ENGINEERING, 2020, 571
[3] Network Planning with Deep Reinforcement Learning
Zhu, Hang
Gupta, Varun
Ahuja, Satyajeet Singh
Tian, Yuandong
Zhang, Ying
Jin, Xin
SIGCOMM '21: PROCEEDINGS OF THE 2021 ACM SIGCOMM 2021 CONFERENCE, 2021, : 258 - 271
[4] Routing Optimization Algorithm under Deep Reinforcement Learning in Software Defined Network
Xi, Qi
Zhang, Xiang
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (12): : 3431 - 3449
[5] A wavelength routing algorithm for optical satellite network based on deep reinforcement learning
Li X.
Li Y.
Zhao S.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2023, 45 (01): : 264 - 270
[6] INTELLIGENT PREDICTION OF NETWORK SECURITY SITUATIONS BASED ON DEEP REINFORCEMENT LEARNING ALGORITHM
Lu, Yan
Kuang, Yunxin
Yang, Qiufen
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (01): : 147 - 155
[7] A Cognitive Relay Network Throughput Optimization Algorithm Based on Deep Reinforcement Learning
Liu, Shaojiang
Hu, Kejing
Ni, Weichuan
Xu, Zhiming
Wang, Feng
Wan, Zhiping
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2019, 2019
[8] Deep Reinforcement Learning with Dual Targeting Algorithm
Kodama, Naoki
Harada, Taku
Miyazaki, Kazuteru
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[9] An Intelligent SDWN Routing Algorithm Based on Network Situational Awareness and Deep Reinforcement Learning
Li, Jinqiang
Ye, Miao
Huang, Linqiang
Deng, Xiaofang
Qiu, Hongbing
Wang, Yong
Jiang, Qiuxiang
IEEE ACCESS, 2023, 11 : 83322 - 83342
[10] Deep Reinforcement Learning Enabled Network Routing Optimization Approach with an Enhanced DDPG Algorithm
Meng, Lingyu
Yang, Wen
Guo, Bingli
Huang, Shanguo
2020 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE (ACP) AND INTERNATIONAL CONFERENCE ON INFORMATION PHOTONICS AND OPTICAL COMMUNICATIONS (IPOC), 2020,

← 1 2 3 4 5 →