Metaheuristic-based weight optimization for robust deep reinforcement learning in continuous control

被引:0
作者
Ko, Gwang-Jong [1 ]
Huh, Jaeseok [2 ]
机构
[1] Korea Univ, Sch Ind & Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Tech Univ Korea, Dept Business Adm, 237 Sangidaehak Ro, Siheung Si 15073, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Deep reinforcement learning; Continuous control; Metaheuristic; Swarm intelligence algorithm; Particle swarm optimization; Grey wolf optimizer; PARTICLE SWARM OPTIMIZATION; NEURAL-NETWORKS;
D O I
10.1016/j.swevo.2025.101920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent studies, the policy-based deep reinforcement learning (DRL) algorithms have exhibited superior performance in addressing continuous control problems, such as machine arms control and robot gait learning. However, these algorithms frequently face challenges inherent in gradient descent-based weight optimization methods, including susceptibility to local optima, slow learning speeds due to saddle points, approximation errors, and suboptimal hyperparameters. This instability leads to significant performance discrepancies among agent instances trained under identical settings, which complicates the practical application of reinforcement learning. To address this, we propose a metaheuristic-based weight optimization framework designed to mitigate learning instability in DRL for continuous control tasks. The proposed framework introduces a twophase optimization process, where an additional search phase using swarm intelligence algorithms is conducted at the end of the learning phase utilizing DRL. In numerical experiments, the proposed framework demonstrated superior and more stable performance compared to conventional DRL algorithms in robot locomotion tasks.
引用
收藏
页数:14
相关论文
共 58 条
[51]  
Thrun S. B., 1992, Efficient Exploration in Reinforcement Learning
[52]   Use of Proximal Policy Optimization for the Joint Replenishment Problem [J].
Vanvuchelen, Nathalie ;
Gijsbrechts, Joren ;
Boute, Robert .
COMPUTERS IN INDUSTRY, 2020, 119
[53]   Grandmaster level in StarCraft II using multi-agent reinforcement learning [J].
Vinyals, Oriol ;
Babuschkin, Igor ;
Czarnecki, Wojciech M. ;
Mathieu, Michael ;
Dudzik, Andrew ;
Chung, Junyoung ;
Choi, David H. ;
Powell, Richard ;
Ewalds, Timo ;
Georgiev, Petko ;
Oh, Junhyuk ;
Horgan, Dan ;
Kroiss, Manuel ;
Danihelka, Ivo ;
Huang, Aja ;
Sifre, Laurent ;
Cai, Trevor ;
Agapiou, John P. ;
Jaderberg, Max ;
Vezhnevets, Alexander S. ;
Leblond, Remi ;
Pohlen, Tobias ;
Dalibard, Valentin ;
Budden, David ;
Sulsky, Yury ;
Molloy, James ;
Paine, Tom L. ;
Gulcehre, Caglar ;
Wang, Ziyu ;
Pfaff, Tobias ;
Wu, Yuhuai ;
Ring, Roman ;
Yogatama, Dani ;
Wunsch, Dario ;
McKinney, Katrina ;
Smith, Oliver ;
Schaul, Tom ;
Lillicrap, Timothy ;
Kavukcuoglu, Koray ;
Hassabis, Demis ;
Apps, Chris ;
Silver, David .
NATURE, 2019, 575 (7782) :350-+
[54]  
WILLIAMS RJ, 1992, MACH LEARN, V8, P229, DOI 10.1007/BF00992696
[55]   From Swarm Intelligence to Metaheuristics: Nature-Inspired Optimization Algorithms [J].
Yang, Xin-She ;
Deb, Suash ;
Fong, Simon ;
He, Xingshi ;
Zhao, Yu-Xin .
COMPUTER, 2016, 49 (09) :52-59
[56]   A Review of Deep Reinforcement Learning for Smart Building Energy Management [J].
Yu, Liang ;
Qin, Shuqi ;
Zhang, Meng ;
Shen, Chao ;
Jiang, Tao ;
Guan, Xiaohong .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (15) :12046-12063
[57]   Neural Network-based control using Actor-Critic Reinforcement Learning and Grey Wolf Optimizer with experimental servo system validation [J].
Zamfirache, Iuliu Alexandru ;
Precup, Radu-Emil ;
Roman, Raul-Cristian ;
Petriu, Emil M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225
[58]   Policy Iteration Reinforcement Learning-based control using a Grey Wolf Optimizer algorithm [J].
Zamfirache, Iuliu Alexandru ;
Precup, Radu-Emil ;
Roman, Raul-Cristian ;
Petriu, Emil M. .
INFORMATION SCIENCES, 2022, 585 :162-175