Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18 | 2024年
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 48 条
  • [41] Performance Analysis of Multi-processor Two-Stage Tandem Call Center Retrial Queues with Non-Reliable Processors
    Kumar, B. Krishna
    Sankar, R.
    Krishnan, R. Navaneetha
    Rukmani, R.
    METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2022, 24 (01) : 95 - 142
  • [42] Approximate Soft Policy Iteration Based Reinforcement Learning for Differential Games with Two Pursuers versus One Evader
    Jiang, Fan
    Guo, Xian
    Zhang, Xuebo
    Zhang, Zhichao
    Dong, Dazhi
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2020), 2020, : 471 - 476
  • [43] Influence factors on injury severity of bicycle-motor vehicle crashes: A two-stage comparative analysis of urban and suburban areas in Beijing
    Sun, Zhiyuan
    Xing, Yuxuan
    Gu, Xin
    Chen, Yanyan
    TRAFFIC INJURY PREVENTION, 2022, 23 (02) : 118 - 124
  • [44] Two-Stage Fuzzy Logic Inference Algorithm for Maximizing the Quality of Performance under the Operational Constraints of Power Grid in Electric Vehicle Parking Lots
    Hussain, Shahid
    Lee, Ki-Beom
    A. Ahmed, Mohamed
    Hayes, Barry
    Kim, Young-Chon
    ENERGIES, 2020, 13 (18)
  • [45] Deep reinforcement learning with a critic-value-based branch tree for the inverse design of two-dimensional optical devices
    Hwang, Hyo-Seok
    Lee, Minhyeok
    Seok, Junhee
    APPLIED SOFT COMPUTING, 2022, 127
  • [46] A two-stage factorial-analysis-based input-output model for virtual-water quantification and metabolic-network identification in Kyrgyzstan
    Zhang, H.
    Li, Y. P.
    Sun, J.
    Liu, J.
    Huang, G. H.
    Ding, Y. K.
    Wu, X. J.
    JOURNAL OF CLEANER PRODUCTION, 2021, 301
  • [47] Sub-50 fs pulses at 2050 nm from a picosecond Ho:YLF laser using a two-stage Kagome-fiber-based compressor
    Murari, Krishna
    Cirmi, Giovanni
    Cankaya, Hueseyin
    Stein, Gregory J.
    Debord, Benoit
    Gerome, Frederic
    Ritzkosky, Felix
    Benabid, Fetah
    Kaertner, Franz X.
    PHOTONICS RESEARCH, 2022, 10 (03) : 637 - 645
  • [48] Effective control of two-dimensional Rayleigh-Benard convection: Invariant multi-agent reinforcement learning is all you need
    Vignon, Colin
    Rabault, Jean
    Vasanth, Joel
    Alcantara-Avila, Francisco
    Mortensen, Mikael
    Vinuesa, Ricardo
    PHYSICS OF FLUIDS, 2023, 35 (06)