Diversity Evolutionary Policy Deep Reinforcement Learning

被引:6
作者
Liu, Jian [1 ,2 ]
Feng, Liming [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
[2] China Univ Min & Technol, Minist Educ, Engn Res Ctr Intelligent Control Undergrod Space, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
LEVEL;
D O I
10.1155/2021/5300189
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy method (CEM) in evolution policy, maximum mean difference (MMD), and twin delayed deep deterministic policy gradient algorithm (TD3) are combined to propose a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm. By using the maximum mean discrepancy as a measure of the distance between different policies, some of the policies in the population maximize the distance between them and the previous generation of policies while maximizing the cumulative return during the gradient update. Furthermore, combining the cumulative returns and the distance between policies as the fitness of the population encourages more diversity in the offspring policies, which in turn can reduce the risk of falling into local optimal due to the disappearance of the gradient. The results in the MuJoCo test environment show that DEPRL has achieved excellent performance on continuous control tasks; especially in the Ant-v2 environment, the return of DEPRL ultimately achieved a nearly 20% improvement compared to TD3.
引用
收藏
页数:11
相关论文
共 37 条
[1]  
[Anonymous], 2018, P INT C MACH LEARN
[2]  
[Anonymous], 2008, ACTA ELECT SIN
[3]  
Brockman Greg, 2016, OPENAI GYM
[4]   Detection of Malicious Code Variants Based on Deep Learning [J].
Cui, Zhihua ;
Xue, Fei ;
Cai, Xingjuan ;
Cao, Yang ;
Wang, Gai-ge ;
Chen, Jinjun .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (07) :3187-3196
[5]  
Dietterich TG, 1997, AI MAG, V18, P97
[6]  
Fortunato M., 2017, Noisy networks for exploration
[7]   An introduction and survey of estimation of distribution algorithms [J].
Hauschild, Mark ;
Pelikan, Martin .
SWARM AND EVOLUTIONARY COMPUTATION, 2011, 1 (03) :111-128
[8]   Harris hawks optimization: Algorithm and applications [J].
Heidari, Ali Asghar ;
Mirjalili, Seyedali ;
Faris, Hossam ;
Aljarah, Ibrahim ;
Mafarja, Majdi ;
Chen, Huiling .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 97 :849-872
[9]   Machine learning: Trends, perspectives, and prospects [J].
Jordan, M. I. ;
Mitchell, T. M. .
SCIENCE, 2015, 349 (6245) :255-260
[10]  
Khadka S., 2018, EVOLUTIONARY REINFOR