Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

被引:0
|
作者
Zhu, Qingling [1 ]
Wu, Xiaoqiang [2 ]
Lin, Qiuzhen [2 ]
Chen, Wei-Neng [3 ]
机构
[1] Shenzhen Univ, Nat Engn Lab Big Data Syst Comp Technol, Shenzhen, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18 | 2024年
基金
国家杰出青年科学基金; 中国国家自然科学基金;
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The integration of Evolutionary Algorithm (EA) and Reinforcement Learning (RL) has emerged as a promising approach for tackling some challenges in RL, such as sparse rewards, lack of exploration, and brittle convergence properties. However, existing methods often employ actor networks as individuals of EA, which may constrain their exploratory capabilities, as the entire actor population will stop evolving when the critic network in RL falls into local optima. To alleviate this issue, this paper introduces a Two-stage Evolutionary Reinforcement Learning (TERL) framework that maintains a population containing both actor and critic networks. TERL divides the learning process into two stages. In the initial stage, individuals independently learn actor-critic networks, which are optimized alternatively by RL and Particle Swarm Optimization (PSO). This dual optimization fosters greater exploration, curbing susceptibility to local optima. Shared information from a common replay buffer and PSO algorithm substantially mitigates the computational load of training multiple agents. In the subsequent stage, TERL shifts to a refined exploitation phase. Here, only the best individual undergoes further refinement, while the remaining individuals continue PSO-based optimization. This allocates more computational resources to the best individual for yielding superior performance. Empirical assessments, conducted across a range of continuous control problems, validate the efficacy of the proposed TERL paradigm.
引用
收藏
页码:20892 / 20900
页数:9
相关论文
共 48 条
  • [21] A two-stage framework for pixel-level pavement surface crack detection
    Guo, Feng
    Liu, Jian
    Xie, Quanyi
    Yu, Huayang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [22] Forecast of China's Annual Carbon Emissions Based on Two-Stage Model
    Zhang, Xiaolei
    Xiong, Jingbo
    Song, Jianqi
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2022, 10
  • [23] A two-stage stochastic programming model for bike-sharing systems with rebalancing
    Cavagnini, Rossana
    Maggioni, Francesca
    Bertazzi, Luca
    Hewitt, Mike
    EURO JOURNAL ON TRANSPORTATION AND LOGISTICS, 2024, 13
  • [24] Leveraging transition exploratory bonus for efficient exploration in Hard-Transiting reinforcement learning problems
    Yang, Shangdong
    Wang, Huihui
    Dong, Shaokang
    Chen, Xingguo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 145 : 442 - 453
  • [25] Addressing maximization bias in reinforcement learning with two-sample testing
    Waltz, Martin
    Okhrin, Ostap
    ARTIFICIAL INTELLIGENCE, 2024, 336
  • [26] A two-stage method for predicting and scheduling energy in an oxygen/nitrogen system of the steel industry
    Han, Zhongyang
    Zhao, Jun
    Wang, Wei
    Liu, Ying
    CONTROL ENGINEERING PRACTICE, 2016, 52 : 35 - 45
  • [27] Enhancing offline reinforcement learning for wastewater treatment via transition filter and prioritized approximation loss
    Yang, Ruyue
    Wang, Ding
    Li, Menghua
    Cui, Chengyu
    Qiao, Junfei
    NEUROCOMPUTING, 2025, 636
  • [28] Coordination mechanisms in a two-stage green supply chain: analyzing the impact of transportation decisions on environment
    Ramandi, Milad Darzi
    Bafruei, Morteza Khakzar
    Ansaripoor, Amir H.
    Paul, Sanjoy Kumar
    Chowdhury, Md Maruf Hossan
    INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2023, 30 (06) : 4170 - 4207
  • [29] A two-stage stochastic optimization framework for retail supply chain modeling with contemporaneous resilient strategies
    Roy, Hemendra Nath
    Almehdawe, Eman
    Kabir, Golam
    PRODUCTION ENGINEERING-RESEARCH AND DEVELOPMENT, 2024, 18 (06): : 903 - 924
  • [30] Bi-Dueling DQN Enhanced Two-Stage Scheduling for Augmented Surveillance in Smart EMS
    Liang, Wei
    Xie, Weiquan
    Zhou, Xiaokang
    I-Kai Wang, Kevin
    Ma, Jianhua
    Jin, Qun
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (07) : 8218 - 8228