Enhanced Deep Deterministic Policy Gradient Algorithm Using Grey Wolf Optimizer for Continuous Control Tasks

被引:6
|
作者
Sumiea, Ebrahim Hamid Hasan [1 ,2 ]
Abdulkadir, Said Jadid [1 ,2 ]
Ragab, Mohammed Gamal [1 ,2 ]
Al-Selwi, Safwan Mahmood [1 ,2 ]
Fati, Suliamn Mohamed [3 ]
Alqushaibi, Alawi [1 ,2 ]
Alhussian, Hitham [1 ,2 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Seri Iskandar 32610, Malaysia
[2] Univ Teknol PETRONAS, Ctr Res Data Sci CeRDaS, Seri Iskandar 32610, Malaysia
[3] Prince Sultan Univ, Informat Syst Dept, Riyadh 11586, Saudi Arabia
关键词
Deep deterministic policy gradient; deep reinforcement learning; grey wolf optimization; hyperparameters optimization;
D O I
10.1109/ACCESS.2023.3341507
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Reinforcement Learning (DRL) allows agents to make decisions in a specific environment based on a reward function, without prior knowledge. Adapting hyperparameters significantly impacts the learning process and time. Precise estimation of hyperparameters during DRL training poses a major challenge. To tackle this problem, this study utilizes Grey Wolf Optimization (GWO), a metaheuristic algorithm, to optimize the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm for achieving optimal control strategy in two simulated Gymnasium environments provided by OpenAI. The ability to adapt hyperparameters accurately contributes to faster convergence and enhanced learning, ultimately leading to more efficient control strategies. The proposed DDPG-GWO algorithm is evaluated in the 2DRobot and MountainCarContinuous simulation environments, chosen for their ease of implementation. Our experimental results reveal that optimizing the hyperparameters of the DDPG using the GWO algorithm in the Gymnasium environments maximizes the total rewards during testing episodes while ensuring the stability of the learning policy. This is evident in comparing our proposed DDPG-GWO agent with optimized hyperparameters and the original DDPG. In the 2DRobot environment, the original DDPG had rewards ranging from -150 to -50, whereas, in the proposed DDPG-GWO, they ranged from -100 to 100 with a running average between 1 and 800 across 892 episodes. In the MountainCarContinuous environment, the original DDPG struggled with negative rewards, while the proposed DDPG-GWO achieved rewards between 20 and 80 over 218 episodes with a total of 490 timesteps.
引用
收藏
页码:139771 / 139784
页数:14
相关论文
共 50 条
  • [1] Continuous Control of a Robot Manipulator using Deep Deterministic Policy Gradient
    Shetty, Maithili
    Vishishta, Brunda
    Choragi, Shrinidhi
    Subramanian, Karpagavalli
    George, Koshy
    2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC), 2021, : 213 - 218
  • [2] Controlling Bicycle Using Deep Deterministic Policy Gradient Algorithm
    Le Pham Tuyen
    Chung, TaeChoong
    2017 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2017, : 413 - 417
  • [3] Deep deterministic policy gradient algorithm for UAV control
    Huang X.
    Liu J.
    Jia C.
    Wang Z.
    Zhang J.
    Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (11):
  • [4] Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm
    Wang, Pin
    Li, Hanhan
    Chan, Ching-Yao
    2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 1454 - 1460
  • [5] Developing Flight Control Policy Using Deep Deterministic Policy Gradient
    Tsourdos, Antonios
    Permana, Adhi Dharma
    Budiarti, Dewi H.
    Shin, Hyo-Sang
    Lee, Chang-Hun
    2019 IEEE INTERNATIONAL CONFERENCE ON AEROSPACE ELECTRONICS AND REMOTE SENSING TECHNOLOGY (ICARES 2019), 2019,
  • [6] Supervised integrated deep deterministic policy gradient model for enhanced control of chemical processes
    Zhang, Jiaxin
    Fan, Songdi
    Feng, Zemin
    Dong, Lichun
    Dai, Yiyang
    CHEMICAL ENGINEERING SCIENCE, 2025, 301
  • [7] Compensation Control of UAV Based on Deep Deterministic Policy Gradient
    Xu, Zijun
    Qi, Juntong
    Wang, Mingming
    Wu, Chong
    Yang, Guang
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2289 - 2296
  • [8] Agent-Based Energy Sharing Mechanism Using Deep Deterministic Policy Gradient Algorithm
    Kuang, Yi
    Wang, Xiuli
    Zhao, Hongyang
    Huang, Yijun
    Chen, Xianlong
    Wang, Xifan
    ENERGIES, 2020, 13 (19)
  • [9] Composite deep learning control for autonomous bicycles by using deep deterministic policy gradient
    He, Kanghui
    Dong, Chaoyang
    Yan, An
    Zheng, Qingyuan
    Liang, Bin
    Wang, Qing
    IECON 2020: THE 46TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2020, : 2766 - 2773
  • [10] Deep Deterministic Policy Gradient Algorithm based Lateral and Longitudinal Control for Autonomous Driving
    Zhu Gongsheng
    Pei Chunmei
    Ding Jiang
    Shi Junfeng
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 736 - 741