Efficient Q-learning hyperparameter tuning using FOX optimization algorithm

被引：0

作者：

Jumaah, Mahmood A. ^{[1
]}

Ali, Yossra H. ^{[1
]}

Rashid, Tarik A. ^{[2
]}

机构：

[1] Univ Technol Iraq, Dept Comp Sci, Al Sinaa St, Baghdad 10066, Iraq

[2] Univ Kurdistan Hewler, Dept Comp Sci & Engn, 30 Meter Ave, Erbil 44001, Iraq

来源：

RESULTS IN ENGINEERING | 2025年 / 25卷

关键词：

FOX optimization algorithm; Hyperparameter; Optimization; Q-learning; Reinforcement learning;

D O I：

10.1016/j.rineng.2025.104341

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Reinforcement learning is a branch of artificial intelligence in which agents learn optimal actions through interactions with their environment. Hyperparameter tuning is crucial for optimizing reinforcement learning algorithms and involves the selection of parameters that can significantly impact learning performance and reward. Conventional Q-learning relies on fixed hyperparameter without tuning throughout the learning process, which is sensitive to the outcomes and can hinder optimal performance. In this paper, a new adaptive hyperparameter tuning method, called Q-learning-FOX (Q-FOX), is proposed. This method utilizes the FOX Optimizer-an optimization algorithm inspired by the hunting behaviour of red foxes-to adaptively optimize the learning rate (alpha) and discount factor (gamma) in the Q-learning. Furthermore, a novel objective function is proposed that maximizes the average Q-values. The FOX utilizes this function to select the optimal solutions with maximum fitness, thereby enhancing the optimization process. The effectiveness of the proposed method is demonstrated through evaluations conducted on two OpenAI Gym control tasks: Cart Pole and Frozen Lake. The proposed method achieved superior cumulative reward compared to established optimization algorithms, as well as fixed and random hyperparameter tuning methods. The fixed and random methods represent the conventional Qlearning. However, the proposed Q-FOX method consistently achieved an average cumulative reward of 500 (the maximum possible) for the Cart Pole task and 0.7389 for the Frozen Lake task across 30 independent runs, demonstrating a 23.37% higher average cumulative reward than conventional Q-learning, which uses established optimization algorithms in both control tasks. Ultimately, the study demonstrates that Q-FOX is superior to tuning hyperparameters adaptively in Q-learning, outperforming established methods.

引用

页数：14

共 50 条

[31] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
Ghazanfari, Behzad
Mozayani, Nasser
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
[32] Automated Portfolio Rebalancing using Q-learning
Darapaneni, Narayana
Basu, Amitavo
Savla, Sanket
Gururajan, Raamanathan
Saquib, Najmus
Singhavi, Sudarshan
Kale, Aishwarya
Bid, Pratik
Paduri, Anwesh Reddy
2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 596 - 602
[33] Incorporating Q-learning and gradient search scheme into JAYA algorithm for global optimization
Lingyun Deng
Sanyang Liu
Artificial Intelligence Review, 2023, 56 : 3705 - 3748
[34] Q-learning whale optimization algorithm for test suite generation with constraints support
Hassan, Ali Abdullah
Abdullah, Salwani
Zamli, Kamal Z.
Razali, Rozilawati
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (34) : 24069 - 24090
[35] Incorporating Q-learning and gradient search scheme into JAYA algorithm for global optimization
Deng, Lingyun
Liu, Sanyang
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL3) : S3705 - S3748
[36] Q-learning whale optimization algorithm for test suite generation with constraints support
Ali Abdullah Hassan
Salwani Abdullah
Kamal Z. Zamli
Rozilawati Razali
Neural Computing and Applications, 2023, 35 : 24069 - 24090
[37] An improved Q-learning algorithm using experience sharing for multi-robot system
School of Astronautics, Harbin Institute of Technology, Harbin, China
不详
J. Comput. Inf. Syst., 9 (3387-3394): : 3387 - 3394
[38] A selection hyper-heuristic algorithm with Q-learning mechanism
Zhao, Fuqing
Liu, Yuebao
Zhu, Ningning
Xu, Tianpeng
Jonrinaldi
APPLIED SOFT COMPUTING, 2023, 147
[39] Heuristically accelerated Q-learning algorithm based on Laplacian Eigenmap
Zhu, Mei-Qiang
Li, Ming
Cheng, Yu-Hu
Zhang, Qian
Wang, Xue-Song
Kongzhi yu Juece/Control and Decision, 2014, 29 (03): : 425 - 430
[40] A study on a Q-Learning algorithm application to a manufacturing assembly problem
Neves, Miguel
Vieira, Miguel
Neto, Pedro
JOURNAL OF MANUFACTURING SYSTEMS, 2021, 59 : 426 - 440

← 1 2 3 4 5 →