Survey of Deep Reinforcement Learning Methods with Evolutionary Algorithms

被引:0
作者
Lü S. [1 ,2 ]
Gong X.-Y. [1 ,2 ]
Zhang Z.-H. [1 ,2 ]
Han S. [1 ,2 ,3 ]
Zhang J.-W. [1 ,2 ]
机构
[1] College of Computer Science and Technology, Jilin University, Changchun
[2] Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun
[3] Department of Information and Computing Sciences, Utrecht University, Utrecht
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2022年 / 45卷 / 07期
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Evolution strategies; Evolutionary algorithms; Genetic algorithms; Reinforcement learning;
D O I
10.11897/SP.J.1016.2022.01478
中图分类号
学科分类号
摘要
Deep reinforcement learning is one of the most important branches in the field of machine learning, which can achieve end-to-end learning through direct interaction with the environment and is capable of solving high-dimensional and large-scale problems. Although deep reinforcement learning has achieved remarkable results, it still faces problems such as insufficient exploration of the environment, poor robustness, and susceptibility of gradients caused by deceptive rewards. In general, evolutionary algorithms have good global search ability, robustness, parallelism and other advantages. Therefore, the methods combining evolutionary algorithms with deep reinforcement learning to compensate the inadequacy of deep reinforcement learning methods have become a research hotspot recently. This paper focuses on the applications of evolutionary algorithms in model-free deep reinforcement learning methods. We introduce evolutionary algorithms and basic methods of reinforcement learning firstly. After that, we introduce the characteristics, advantages, disadvantages, and applicable tasks of evolutionary algorithms, deep reinforcement learning algorithms, and combined methods of evolutionary algorithms and deep reinforcement learning, showing the necessity of combined methods from a different aspect. Then, two types of reinforcement learning methods with evolutionary algorithms are elaborated, which are reinforcement learning with evolutionary algorithms guided policy search and combination of evolutionary algorithms and deep reinforcement learning. In reinforcement learning with evolutionary algorithms guided policy search methods, we categorize the different policy search methods into parameter distribution search methods, policy gradient approximation methods, and policy population search methods. Parameter distribution search methods regard the parameters of a policy as a distribution and sample the parameters from this distribution to form a new policy. Policy gradient approximation methods use the fitness of the policy as an approximation of the gradient to update the parameters. Policy population search methods search directly from individuals in the policy population and select the individual with higher fitness. Then, we focus on the combined methods of evolutionary algorithms and deep reinforcement learning which attracts the interest of scholars currently, including evolutionary algorithm experience-guided deep reinforcement learning methods and evolutionary algorithm modules-embedded deep reinforcement learning methods. The evolutionary algorithm experience-guided deep reinforcement learning methods use experience obtained from individuals by continually interacting with the environment to guide the value network of reinforcement learning, while the evolutionary algorithm module-embedded deep reinforcement learning methods embed the evolutionary algorithm as an auxiliary module in the learning process of reinforcement learning. Furthermore, we compare and analyze these methods in detail. In particular, we compare the characteristics of various algorithms in the methods combing evolutionary algorithms and deep reinforcement learning, including without-feedback guidance methods and with-feedback guidance methods. We also compare the performance of various widely-used algorithms in with-feedback guidance methods on the continuous control tasks of MuJoCo and give a detailed analysis and future directions for improvement and research. Finally, we summarize all the combined methods of evolutionary algorithms and deep reinforcement learning mentioned in the paper, and we study the research emphasis and development trend of this field. Although evolutionary deep reinforcement learning frameworks have been proposed, we think these methods still require further theoretical study to balance the issues of exploration and exploitation. © 2022, Science Press. All right reserved.
引用
收藏
页码:1478 / 1499
页数:21
相关论文
共 95 条
[1]  
Watkins C J C H., Learning from delayed rewards, (1989)
[2]  
Mnih V, Kavukcuoglu K, Silver D, Et al., Playing atari with deep reinforcement learning, Proceedings of the Workshops at the 26th Neural Information Processing Systems (NIPS), pp. 201-220, (2013)
[3]  
Mnih V, Kavukcuoglu K, Silver D, Et al., Human-level control through deep reinforcement learning, Nature, 518, 7540, pp. 529-533, (2015)
[4]  
Liu Q, Zhai J, Zhang Z, Et al., A survey on deep reinforcement learning, Chinese Journal of Computers, 41, 1, pp. 1-27, (2018)
[5]  
Silver D, Huang A, Maddison C J, Et al., Mastering the game of go with deep neural networks and tree search, Nature, 529, 7587, pp. 484-489, (2016)
[6]  
Silver D, Schrittwieser J, Simonyan K, Et al., Mastering the game of go without human knowledge, Nature, 550, 7676, pp. 354-359, (2017)
[7]  
Vinyals O, Babuschkin I, Czarnecki W M, Et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, 575, 7782, pp. 350-354, (2019)
[8]  
Polydoros A S, Nalpantidis L., Survey of model-based reinforcement learning: Applications on robotics, Journal of Intelligent & Robotic Systems, 86, 2, pp. 153-173, (2017)
[9]  
Yu L, Zhang W, Wang J, Et al., Seqgan: Sequence generative adversarial nets with policy gradient, Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), pp. 2852-2858, (2017)
[10]  
Ng A Y, Coates A, Diel M, Et al., Autonomous Inverted Helicopter Flight Via Reinforcement Learning, Experimental Robotics IX, pp. 363-372, (2006)