Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18 | 2024年
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 48 条
  • [1] Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning
    Wang, Xinzhi
    Liu, Yang
    Chang, Yudong
    Jiang, Chao
    Zhang, Qingjie
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 200 - 211
  • [2] Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation
    Yao, Yao
    Xiao, Li
    An, Zhicheng
    Zhang, Wanpeng
    Luo, Dijun
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4202 - 4208
  • [3] Two-Stage Hybrid Extreme Learning Machine for Sequential Imbalanced Data
    Mao, Wentao
    Wang, Jinwan
    He, Ling
    Tian, Yangyang
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 423 - 433
  • [4] Diversity Evolutionary Policy Deep Reinforcement Learning
    Liu, Jian
    Feng, Liming
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [5] Online sequential prediction of imbalance data with two-stage hybrid strategy by extreme learning machine
    Mao, Wentao
    Wang, Jinwan
    He, Ling
    Tian, Yangyang
    NEUROCOMPUTING, 2017, 261 : 94 - 105
  • [6] Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents
    Erdodi, Laszlo
    Sommervoll, Avald Aslaugson
    Zennaro, Fabio Massimo
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2021, 61
  • [7] NAEM: Noisy Attention Exploration Module for Deep Reinforcement Learning
    Cai, Zhenwen
    Lee, Feifei
    Hu, Chunyan
    Kotani, Koji
    Chen, Qiu
    IEEE ACCESS, 2021, 9 : 154600 - 154611
  • [8] Two-stage training algorithm for AI robot soccer
    Kim, Taeyoung
    Vecchietti, Luiz Felipe
    Choi, Kyujin
    Sariel, Sanem
    Har, Dongsoo
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [9] A two-stage approach to modeling vacant taxi movements
    Wong, R. C. P.
    Szeto, W. Y.
    Wong, S. C.
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2015, 59 : 147 - 163
  • [10] A Two-Stage Approach to Modeling Vacant Taxi Movements
    Wong, R. C. P.
    Szeto, W. Y.
    Wong, S. C.
    21ST INTERNATIONAL SYMPOSIUM ON TRANSPORTATION AND TRAFFIC THEORY, 2015, 7 : 254 - 275