Efficient Q-learning hyperparameter tuning using FOX optimization algorithm

被引:0
|
作者
Jumaah, Mahmood A. [1 ]
Ali, Yossra H. [1 ]
Rashid, Tarik A. [2 ]
机构
[1] Univ Technol Iraq, Dept Comp Sci, Al Sinaa St, Baghdad 10066, Iraq
[2] Univ Kurdistan Hewler, Dept Comp Sci & Engn, 30 Meter Ave, Erbil 44001, Iraq
关键词
FOX optimization algorithm; Hyperparameter; Optimization; Q-learning; Reinforcement learning;
D O I
10.1016/j.rineng.2025.104341
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Reinforcement learning is a branch of artificial intelligence in which agents learn optimal actions through interactions with their environment. Hyperparameter tuning is crucial for optimizing reinforcement learning algorithms and involves the selection of parameters that can significantly impact learning performance and reward. Conventional Q-learning relies on fixed hyperparameter without tuning throughout the learning process, which is sensitive to the outcomes and can hinder optimal performance. In this paper, a new adaptive hyperparameter tuning method, called Q-learning-FOX (Q-FOX), is proposed. This method utilizes the FOX Optimizer-an optimization algorithm inspired by the hunting behaviour of red foxes-to adaptively optimize the learning rate (alpha) and discount factor (gamma) in the Q-learning. Furthermore, a novel objective function is proposed that maximizes the average Q-values. The FOX utilizes this function to select the optimal solutions with maximum fitness, thereby enhancing the optimization process. The effectiveness of the proposed method is demonstrated through evaluations conducted on two OpenAI Gym control tasks: Cart Pole and Frozen Lake. The proposed method achieved superior cumulative reward compared to established optimization algorithms, as well as fixed and random hyperparameter tuning methods. The fixed and random methods represent the conventional Qlearning. However, the proposed Q-FOX method consistently achieved an average cumulative reward of 500 (the maximum possible) for the Cart Pole task and 0.7389 for the Frozen Lake task across 30 independent runs, demonstrating a 23.37% higher average cumulative reward than conventional Q-learning, which uses established optimization algorithms in both control tasks. Ultimately, the study demonstrates that Q-FOX is superior to tuning hyperparameters adaptively in Q-learning, outperforming established methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A New Algorithm to Track Dynamic Goal Position in Q-learning
    Mitra, Soumishila
    Banerjee, Dhrubojyoti
    Konar, Amit
    Janarthanan, R.
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 69 - 74
  • [42] A Path Planning Algorithm for UAV Based on Improved Q-Learning
    Yan, Chao
    Xiang, Xiaojia
    2018 2ND INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION SCIENCES (ICRAS), 2018, : 46 - 50
  • [43] Using sequential statistical tests for efficient hyperparameter tuning
    Buczak, Philip
    Groll, Andreas
    Pauly, Markus
    Rehof, Jakob
    Horn, Daniel
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2024, 108 (02) : 441 - 460
  • [44] A Path Planning Algorithm for Space Manipulator Based on Q-Learning
    Li, Taiguo
    Li, Quanhong
    Li, Wenxi
    Xia, Jiagao
    Tang, Wenhua
    Wang, Weiwen
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 1566 - 1571
  • [45] Study on structural topology optimization of Q-learning cell method
    Song, Xuming
    Shi, Zheyu
    Bao, Shipeng
    Tang, Mian
    Journal of Railway Science and Engineering, 2024, 21 (08) : 3274 - 3285
  • [46] Detecting primary signals for efficient utilization of spectrum using Q-learning
    Reddy, Y. B.
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 360 - 365
  • [47] A robot demonstration method based on LWR and Q-learning algorithm
    Zhao, Guangzhe
    Tao, Yong
    Liu, Hui
    Deng, Xianling
    Chen, Youdong
    Xiong, Hegen
    Xie, Xianwu
    Fang, Zengliang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (01) : 35 - 46
  • [48] Optimal scheduling in cloud healthcare system using Q-learning algorithm
    Yafei Li
    Hongfeng Wang
    Na Wang
    Tianhong Zhang
    Complex & Intelligent Systems, 2022, 8 : 4603 - 4618
  • [49] Optimal scheduling in cloud healthcare system using Q-learning algorithm
    Li, Yafei
    Wang, Hongfeng
    Wang, Na
    Zhang, Tianhong
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 4603 - 4618
  • [50] A framework for co-evolutionary algorithm using Q-learning with meme
    Jiao, Keming
    Chen, Jie
    Xin, Bin
    Li, Li
    Zhao, Zhixin
    Zheng, Yifan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225