Metaheuristic-based weight optimization for robust deep reinforcement learning in continuous control

被引:0
作者
Ko, Gwang-Jong [1 ]
Huh, Jaeseok [2 ]
机构
[1] Korea Univ, Sch Ind & Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Tech Univ Korea, Dept Business Adm, 237 Sangidaehak Ro, Siheung Si 15073, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Deep reinforcement learning; Continuous control; Metaheuristic; Swarm intelligence algorithm; Particle swarm optimization; Grey wolf optimizer; PARTICLE SWARM OPTIMIZATION; NEURAL-NETWORKS;
D O I
10.1016/j.swevo.2025.101920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent studies, the policy-based deep reinforcement learning (DRL) algorithms have exhibited superior performance in addressing continuous control problems, such as machine arms control and robot gait learning. However, these algorithms frequently face challenges inherent in gradient descent-based weight optimization methods, including susceptibility to local optima, slow learning speeds due to saddle points, approximation errors, and suboptimal hyperparameters. This instability leads to significant performance discrepancies among agent instances trained under identical settings, which complicates the practical application of reinforcement learning. To address this, we propose a metaheuristic-based weight optimization framework designed to mitigate learning instability in DRL for continuous control tasks. The proposed framework introduces a twophase optimization process, where an additional search phase using swarm intelligence algorithms is conducted at the end of the learning phase utilizing DRL. In numerical experiments, the proposed framework demonstrated superior and more stable performance compared to conventional DRL algorithms in robot locomotion tasks.
引用
收藏
页数:14
相关论文
共 58 条
[1]   Deep Reinforcement Learning for QoS provisioning at the MAC layer: A Survey [J].
Abbasi, Mahmoud ;
Shahraki, Amin ;
Piran, Md. Jalil ;
Taherkordi, Amir .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[2]   Initialisation Approaches for Population-Based Metaheuristic Algorithms: A Comprehensive Review [J].
Agushaka, Jeffrey O. ;
Ezugwu, Absalom E. .
APPLIED SCIENCES-BASEL, 2022, 12 (02)
[3]   Research on particle swarm optimization based clustering: A systematic review of literature and techniques [J].
Alam, Shafiq ;
Dobbie, Gillian ;
Koh, Yun Sing ;
Riddle, Patricia ;
Rehman, Saeed Ur .
SWARM AND EVOLUTIONARY COMPUTATION, 2014, 17 :1-13
[4]  
[Anonymous], 2012, J. Inf. Comput. Sci.
[5]  
Anschel Oron, 2017, P MACHINE LEARNING R, V70
[6]   Particle Swarm Optimization for Single Objective Continuous Space Problems: A Review [J].
Bonyadi, Mohammad Reza ;
Michalewicz, Zbigniew .
EVOLUTIONARY COMPUTATION, 2017, 25 (01) :1-54
[7]  
Brockman G, 2016, Arxiv, DOI arXiv:1606.01540
[8]   Q-learning based multi-objective immune algorithm for fuzzy flexible job shop scheduling problem considering dynamic disruptions [J].
Chen, Xiao-long ;
Li, Jun-qing ;
Xu, Ying .
SWARM AND EVOLUTIONARY COMPUTATION, 2023, 83
[9]  
Dasagi V, 2019, Arxiv, DOI arXiv:1910.03732
[10]  
Dauphin YN, 2014, ADV NEUR IN, V27