Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks

被引：11

作者：

Sumiea, Ebrahim Hamid Hasan ^{[1
,2
]}

Abdulkadir, Said Jadid ^{[1
,2
]}

Ragab, Mohammed Gamal ^{[1
,2
]}

Al-Selwi, Safwan Mahmood ^{[1
,2
]}

Fati, Suliamn Mohamed ^{[3
]}

Alqushaibi, Alawi ^{[1
,2
]}

Alhussian, Hitham ^{[1
,2
]}

机构：

[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Seri Iskandar 32610, Malaysia

[2] Univ Teknol PETRONAS, Ctr Res Data Sci CeRDaS, Seri Iskandar 32610, Malaysia

[3] Prince Sultan Univ, Informat Syst Dept, Riyadh 11586, Saudi Arabia

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Deep deterministic policy gradient; deep reinforcement learning; grey wolf optimization; hyperparameters optimization;

D O I：

10.1109/ACCESS.2023.3341507

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Reinforcement Learning (DRL) allows agents to make decisions in a specific environment based on a reward function, without prior knowledge. Adapting hyperparameters significantly impacts the learning process and time. Precise estimation of hyperparameters during DRL training poses a major challenge. To tackle this problem, this study utilizes Grey Wolf Optimization (GWO), a metaheuristic algorithm, to optimize the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm for achieving optimal control strategy in two simulated Gymnasium environments provided by OpenAI. The ability to adapt hyperparameters accurately contributes to faster convergence and enhanced learning, ultimately leading to more efficient control strategies. The proposed DDPG-GWO algorithm is evaluated in the 2DRobot and MountainCarContinuous simulation environments, chosen for their ease of implementation. Our experimental results reveal that optimizing the hyperparameters of the DDPG using the GWO algorithm in the Gymnasium environments maximizes the total rewards during testing episodes while ensuring the stability of the learning policy. This is evident in comparing our proposed DDPG-GWO agent with optimized hyperparameters and the original DDPG. In the 2DRobot environment, the original DDPG had rewards ranging from -150 to -50, whereas, in the proposed DDPG-GWO, they ranged from -100 to 100 with a running average between 1 and 800 across 892 episodes. In the MountainCarContinuous environment, the original DDPG struggled with negative rewards, while the proposed DDPG-GWO achieved rewards between 20 and 80 over 218 episodes with a total of 490 timesteps.

引用

页码：139771 / 139784

页数：14

共 50 条

[31] Perception Enhanced Deep Deterministic Policy Gradient for Autonomous Driving in Complex Scenarios [J].

Liao, Lyuchao ;

Xiao, Hankun ;

Xing, Pengqi ;

Gan, Zhenhua ;

He, Youpeng ;

Wang, Jiajun .

CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 140 (01) :557-576

[32] Deep deterministic policy gradient algorithm for crowd-evacuation path planning [J].

Li, Xinjin ;

Liu, Hong ;

Li, Junqing ;

Li, Yan .

COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 161

[33] Deep Deterministic Policy Gradient-Based Algorithm for Computation Offloading in IoV [J].

Li, Haofei ;

Chen, Chen ;

Shan, Hangguan ;

Li, Pu ;

Chang, Yoong Choon ;

Song, Houbing .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (03) :2522-2533

[34] Cooperative Control of Power Grid Frequency Based on Expert-Guided Deep Deterministic Policy Gradient Algorithm [J].

Shen, Tao ;

Zhang, Jing ;

He, Yu ;

Yang, Shengsun ;

Zhang, Demu ;

Yang, Zhaorui .

IEEE ACCESS, 2025, 13 :38502-38514

[35] Robust control for anaerobic digestion systems of Tequila vinasses under uncertainty: A Deep Deterministic Policy Gradient Algorithm [J].

Mendiola-Rodriguez, Tannia A. ;

Ricardez-Sandoval, Luis A. .

DIGITAL CHEMICAL ENGINEERING, 2022, 3

[36] Energy Scheduling of Hydrogen Hybrid UAV Based on Model Predictive Control and Deep Deterministic Policy Gradient Algorithm [J].

Li, Haitao ;

Wang, Chenyu ;

Yuan, Shufu ;

Zhu, Hui ;

Li, Bo ;

Liu, Yuexin ;

Sun, Li .

ALGORITHMS, 2025, 18 (02)

[37] Sparse Variational Deterministic Policy Gradient for Continuous Real-Time Control [J].

Baek, Jongchan ;

Jun, Hayoung ;

Park, Jonghyuk ;

Lee, Hakjun ;

Han, Soohee .

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (10) :9800-9810

[38] DEEP DETERMINISTIC POLICY GRADIENT WITH GENERALIZED INTEGRAL COMPENSATOR FOR HEIGHT CONTROL OF QUADROTOR [J].

Liu, Anlin ;

Liu, Lei ;

Cao, Jinde ;

Alsaadi, Fawaz E. .

JOURNAL OF APPLIED ANALYSIS AND COMPUTATION, 2022, 12 (03) :868-894

[39] Safe reinforcement learning-based control using deep deterministic policy gradient algorithm and slime mould algorithm with experimental tower crane system validation [J].

Zamfirache, Iuliu Alexandru ;

Precup, Radu-Emil ;

Petriu, Emil M. .

INFORMATION SCIENCES, 2025, 692

[40] Rank Selection Method of CP Decomposition Based on Deep Deterministic Policy Gradient Algorithm [J].

Zhang, Shaoshuang ;

Li, Zhao ;

Liu, Wenlong ;

Zhao, Jiaqi ;

Qin, Ting .

IEEE ACCESS, 2024, 12 :97374-97385

← 1 2 3 4 5 →