Efficient Q-learning hyperparameter tuning using FOX optimization algorithm

被引:0
|
作者
Jumaah, Mahmood A. [1 ]
Ali, Yossra H. [1 ]
Rashid, Tarik A. [2 ]
机构
[1] Univ Technol Iraq, Dept Comp Sci, Al Sinaa St, Baghdad 10066, Iraq
[2] Univ Kurdistan Hewler, Dept Comp Sci & Engn, 30 Meter Ave, Erbil 44001, Iraq
关键词
FOX optimization algorithm; Hyperparameter; Optimization; Q-learning; Reinforcement learning;
D O I
10.1016/j.rineng.2025.104341
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Reinforcement learning is a branch of artificial intelligence in which agents learn optimal actions through interactions with their environment. Hyperparameter tuning is crucial for optimizing reinforcement learning algorithms and involves the selection of parameters that can significantly impact learning performance and reward. Conventional Q-learning relies on fixed hyperparameter without tuning throughout the learning process, which is sensitive to the outcomes and can hinder optimal performance. In this paper, a new adaptive hyperparameter tuning method, called Q-learning-FOX (Q-FOX), is proposed. This method utilizes the FOX Optimizer-an optimization algorithm inspired by the hunting behaviour of red foxes-to adaptively optimize the learning rate (alpha) and discount factor (gamma) in the Q-learning. Furthermore, a novel objective function is proposed that maximizes the average Q-values. The FOX utilizes this function to select the optimal solutions with maximum fitness, thereby enhancing the optimization process. The effectiveness of the proposed method is demonstrated through evaluations conducted on two OpenAI Gym control tasks: Cart Pole and Frozen Lake. The proposed method achieved superior cumulative reward compared to established optimization algorithms, as well as fixed and random hyperparameter tuning methods. The fixed and random methods represent the conventional Qlearning. However, the proposed Q-FOX method consistently achieved an average cumulative reward of 500 (the maximum possible) for the Cart Pole task and 0.7389 for the Frozen Lake task across 30 independent runs, demonstrating a 23.37% higher average cumulative reward than conventional Q-learning, which uses established optimization algorithms in both control tasks. Ultimately, the study demonstrates that Q-FOX is superior to tuning hyperparameters adaptively in Q-learning, outperforming established methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] A Novel Ensemble Q-Learning Algorithm for Policy Optimization in Large-Scale Networks
    Bozkus, Talha
    Mitra, Urbashi
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1381 - 1386
  • [22] Advancements in Q-learning meta-heuristic optimization algorithms: A survey
    Yang, Yang
    Gao, Yuchao
    Ding, Zhe
    Wu, Jinran
    Zhang, Shaotong
    Han, Feifei
    Qiu, Xuelan
    Gao, Shangce
    Wang, You-Gan
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (06)
  • [23] Reinforcement Learning for Automatic Parameter Tuning in Apache Spark: A Q-Learning Approach
    Deng, Mei
    Huang, Zirui
    Ren, Zhigang
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 13 - 18
  • [24] A new Q-learning algorithm based on the Metropolis criterion
    Guo, MZ
    Liu, Y
    Malec, J
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (05): : 2140 - 2143
  • [25] Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance
    Rao N.
    Xu H.
    Song B.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2022, 54 (05): : 162 - 170
  • [26] Collaborative Traffic Signal Automation Using Deep Q-Learning
    Hassan, Muhammad Ahmed
    Elhadef, Mourad
    Khan, Muhammad Usman Ghani
    IEEE ACCESS, 2023, 11 : 136015 - 136032
  • [27] A Weighted Smooth Q-Learning Algorithm
    Vijesh, V. Antony
    Shreyas, S. R.
    IEEE CONTROL SYSTEMS LETTERS, 2025, 9 : 21 - 26
  • [28] UAV Path Planning Optimization Strategy: Considerations of Urban Morphology, Microclimate, and Energy Efficiency Using Q-Learning Algorithm
    Souto, Anderson
    Alfaia, Rodrigo
    Cardoso, Evelin
    Araujo, Jasmine
    Frances, Carlos
    DRONES, 2023, 7 (02)
  • [29] ETQ-learning: an improved Q-learning algorithm for path planning
    Wang, Huanwei
    Jing, Jing
    Wang, Qianlv
    He, Hongqi
    Qi, Xuyan
    Lou, Rui
    INTELLIGENT SERVICE ROBOTICS, 2024, 17 (04) : 915 - 929
  • [30] Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning
    Er, MJ
    Deng, C
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (03): : 1478 - 1489