Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18 | 2024年
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 48 条
  • [31] Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply Chains
    Stranieri, Francesco
    Stella, Fabio
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT IV, 2025, 2136 : 454 - 469
  • [32] A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error
    Wang, Xianjia
    Yang, Zhipeng
    Chen, Guici
    Liu, Yanli
    ELECTRONICS, 2023, 12 (19)
  • [33] Two-stage approach for the inference of the source of high-dimensional and complex chemical data in forensic science
    Ausdemore, Madeline A.
    Neumann, Cedric
    Saunders, Christopher P.
    Armstrong, Douglas
    Muehlethaler, Cyril
    JOURNAL OF CHEMOMETRICS, 2021, 35 (01)
  • [34] Semi-permutation-based genetic algorithm for order acceptance and scheduling in two-stage assembly problem
    Yavari, Mohammad
    Marvi, Mozhgan
    Akbari, Amir Hosein
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (08): : 2989 - 3003
  • [35] Centralized resource allocation and target setting of a two-stage production process using data envelopment analysis
    Yang, Fu-Chiang
    INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2024, 31 (02) : 889 - 917
  • [36] Performance of deep reinforcement learning algorithms in two-echelon inventory control systems
    Stranieri, Francesco
    Stella, Fabio
    Kouki, Chaaben
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2024, 62 (17) : 6211 - 6226
  • [37] A Two-Stage Multi-Scenario Optimization Method for Placement and Sizing of Soft Open Points in Distribution Networks
    Saaklayen, Md Abu
    Shabbir, Md Nasmus Sakib Khan
    Liang, Xiaodong
    Faried, Sherif O.
    Janbakhsh, Mehrnoosh
    IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2023, 59 (03) : 2877 - 2891
  • [38] Two-stage dither to enhance gray scales based on real-time motion detection in plasma display panel
    Wang Yao-gong
    Zhang Xiao-ning
    Tu Zhen-tao
    Liu Chun-liang
    DISPLAYS, 2015, 36 : 13 - 20
  • [39] Predicting daily ozone concentration maxima using fuzzy time series based on a two-stage linguistic partition method
    Cheng, Ching-Hsue
    Huang, Sue-Fen
    Teoh, Hia-Jong
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2011, 62 (04) : 2016 - 2028
  • [40] Interim tissue changes following connective tissue grafting and two-stage implant placement. A randomized clinical trial
    Papapetros, Dimitrios
    Vassilis, Karagiannis
    Antonis, Konstantinidis
    Danae, Apatzidou A.
    JOURNAL OF CLINICAL PERIODONTOLOGY, 2019, 46 (09) : 958 - 968