Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks

被引：6

作者：

Sumiea, Ebrahim Hamid Hasan ^{[1
,2
]}

Abdulkadir, Said Jadid ^{[1
,2
]}

Ragab, Mohammed Gamal ^{[1
,2
]}

Al-Selwi, Safwan Mahmood ^{[1
,2
]}

Fati, Suliamn Mohamed ^{[3
]}

Alqushaibi, Alawi ^{[1
,2
]}

Alhussian, Hitham ^{[1
,2
]}

机构：

[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Seri Iskandar 32610, Malaysia

[2] Univ Teknol PETRONAS, Ctr Res Data Sci CeRDaS, Seri Iskandar 32610, Malaysia

[3] Prince Sultan Univ, Informat Syst Dept, Riyadh 11586, Saudi Arabia

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Deep deterministic policy gradient; deep reinforcement learning; grey wolf optimization; hyperparameters optimization;

D O I：

10.1109/ACCESS.2023.3341507

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Reinforcement Learning (DRL) allows agents to make decisions in a specific environment based on a reward function, without prior knowledge. Adapting hyperparameters significantly impacts the learning process and time. Precise estimation of hyperparameters during DRL training poses a major challenge. To tackle this problem, this study utilizes Grey Wolf Optimization (GWO), a metaheuristic algorithm, to optimize the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm for achieving optimal control strategy in two simulated Gymnasium environments provided by OpenAI. The ability to adapt hyperparameters accurately contributes to faster convergence and enhanced learning, ultimately leading to more efficient control strategies. The proposed DDPG-GWO algorithm is evaluated in the 2DRobot and MountainCarContinuous simulation environments, chosen for their ease of implementation. Our experimental results reveal that optimizing the hyperparameters of the DDPG using the GWO algorithm in the Gymnasium environments maximizes the total rewards during testing episodes while ensuring the stability of the learning policy. This is evident in comparing our proposed DDPG-GWO agent with optimized hyperparameters and the original DDPG. In the 2DRobot environment, the original DDPG had rewards ranging from -150 to -50, whereas, in the proposed DDPG-GWO, they ranged from -100 to 100 with a running average between 1 and 800 across 892 episodes. In the MountainCarContinuous environment, the original DDPG struggled with negative rewards, while the proposed DDPG-GWO achieved rewards between 20 and 80 over 218 episodes with a total of 490 timesteps.

引用

页码：139771 / 139784

页数：14

共 50 条

[1] Continuous Control of a Robot Manipulator using Deep Deterministic Policy Gradient
Shetty, Maithili
Vishishta, Brunda
Choragi, Shrinidhi
Subramanian, Karpagavalli
George, Koshy
2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC), 2021, : 213 - 218
[2] Controlling Bicycle Using Deep Deterministic Policy Gradient Algorithm
Le Pham Tuyen
Chung, TaeChoong
2017 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2017, : 413 - 417
[3] Deep deterministic policy gradient algorithm for UAV control
Huang X.
Liu J.
Jia C.
Wang Z.
Zhang J.
Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (11):
[4] Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm
Wang, Pin
Li, Hanhan
Chan, Ching-Yao
2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 1454 - 1460
[5] Developing Flight Control Policy Using Deep Deterministic Policy Gradient
Tsourdos, Antonios
Permana, Adhi Dharma
Budiarti, Dewi H.
Shin, Hyo-Sang
Lee, Chang-Hun
2019 IEEE INTERNATIONAL CONFERENCE ON AEROSPACE ELECTRONICS AND REMOTE SENSING TECHNOLOGY (ICARES 2019), 2019,
[6] Supervised integrated deep deterministic policy gradient model for enhanced control of chemical processes
Zhang, Jiaxin
Fan, Songdi
Feng, Zemin
Dong, Lichun
Dai, Yiyang
CHEMICAL ENGINEERING SCIENCE, 2025, 301
[7] Compensation Control of UAV Based on Deep Deterministic Policy Gradient
Xu, Zijun
Qi, Juntong
Wang, Mingming
Wu, Chong
Yang, Guang
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2289 - 2296
[8] Agent-Based Energy Sharing Mechanism Using Deep Deterministic Policy Gradient Algorithm
Kuang, Yi
Wang, Xiuli
Zhao, Hongyang
Huang, Yijun
Chen, Xianlong
Wang, Xifan
ENERGIES, 2020, 13 (19)
[9] Composite deep learning control for autonomous bicycles by using deep deterministic policy gradient
He, Kanghui
Dong, Chaoyang
Yan, An
Zheng, Qingyuan
Liang, Bin
Wang, Qing
IECON 2020: THE 46TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2020, : 2766 - 2773
[10] Deep Deterministic Policy Gradient Algorithm based Lateral and Longitudinal Control for Autonomous Driving
Zhu Gongsheng
Pei Chunmei
Ding Jiang
Shi Junfeng
2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 736 - 741

← 1 2 3 4 5 →