Evolutionary reinforcement learning via cooperative coevolutionary negatively correlated search

被引:16
作者
Yang, Peng [1 ]
Zhang, Hu [2 ]
Yu, Yanglong [1 ]
Li, Mingjia [1 ]
Tang, Ke [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen 518055, Peoples R China
[2] Beijing Electromech Engn Inst, Sci & Technol Complex Syst Control & Intelligent, Beijing 100074, Peoples R China
关键词
Evolutionary algorithms; Deep reinforcement learning; Cooperative coevolution; OPTIMIZATION; ENVIRONMENT;
D O I
10.1016/j.swevo.2021.100974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evolutionary algorithms (EAs) have been successfully applied to optimize the policies for Reinforcement Learning (RL) tasks due to their exploration ability. The recently proposed Negatively Correlated Search (NCS) provides a distinct parallel exploration search behavior and is expected to facilitate RL more effectively. Considering that the commonly adopted neural policies usually involves millions of parameters to be optimized, the direct application of NCS to RL may face a great challenge of the large-scale search space. To address this issue, this paper presents an NCS-friendly Cooperative Coevolution (CC) framework to scale-up NCS while largely preserving its parallel exploration search behavior. The issue of traditional CC that can deteriorate NCS is also discussed. Empirical studies on 10 popular Atari games show that the proposed method can significantly outperform three state-ofthe-art deep RL methods with 50% less computational time by effectively exploring a 1.7 million-dimensional search space.
引用
收藏
页数:11
相关论文
共 50 条
[1]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[2]  
[Anonymous], 2012, Coevolutionary Principles
[3]  
[Anonymous], 2017, P NEUR INF PROC SYST
[4]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[5]  
Aytar Y, 2018, ADV NEUR IN, V31
[6]   The Arcade Learning Environment: An Evaluation Platform for General Agents [J].
Bellemare, Marc G. ;
Naddaf, Yavar ;
Veness, Joel ;
Bowling, Michael .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279
[7]  
Chrabaszcz P, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1419
[8]  
Conti E, 2018, Advances in Neural Information Processing Systems 32 (NeurIPS), P5027
[9]   Exploration and Exploitation in Evolutionary Algorithms: A Survey [J].
Crepinsek, Matej ;
Liu, Shih-Hsi ;
Mernik, Marjan .
ACM COMPUTING SURVEYS, 2013, 45 (03)
[10]   Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms [J].
Drugan, Madalina M. .
SWARM AND EVOLUTIONARY COMPUTATION, 2019, 44 :228-246