An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm

被引:25
作者
Zhao, Fuqing [1 ]
Wang, Qiaoyun [1 ]
Wang, Ling [2 ]
机构
[1] Lanzhou Univ Technol, Sch Comp & Commun, Lanzhou 730050, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 10084, Peoples R China
基金
中国国家自然科学基金;
关键词
Inverse reinforcement learning (IRL); Q-learning; Metaheuristic algorithm; Moth-flame optimization algorithm (MFO); Competition mechanism;
D O I
10.1016/j.knosys.2023.110368
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A reward function is learned from the expert examples by inverse reinforcement learning (IRL), which is more reliable than an artificial method. The moth-flame optimization algorithm (MFO), which is based on the navigation mechanism of a moth flying at night, has been extensively employed to address the complex optimization problem. An inverse reinforcement learning framework with the Q-learning mechanism (IRLMFO) is designed to strengthen the performance of the MFO algorithm in a large-scale real-parameter optimization problem. The right strategy is chosen by the Q-learning mechanism, using historical data provided by the relevant approach in the strategy pool, which stores strategies that include diverse functions. The competition mechanism is designed to strengthen the exploitation capability of the IRLMFO algorithm. The performance of the IRLMFO is verified on the benchmark test suite in CEC 2017. Experimental results illustrate that the IRLMFO outperforms state-of-the-art algorithms. (c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:25
相关论文
共 53 条
[1]  
Abbeel P., 2004, Apprenticeship learning via inverse reinforcement learning
[2]   Data-driven Haptic Modeling of Plastic Flow via Inverse Reinforcement Learning [J].
Abdulali, Arsen ;
Jeon, Seokhee .
2021 IEEE WORLD HAPTICS CONFERENCE (WHC), 2021, :115-120
[3]   Aquila Optimizer: A novel meta-heuristic optimization algorithm [J].
Abualigah, Laith ;
Yousri, Dalia ;
Abd Elaziz, Mohamed ;
Ewees, Ahmed A. ;
Al-qaness, Mohammed A. A. ;
Gandomi, Amir H. .
COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 157 (157)
[4]   A parallel variable neighborhood search algorithm with quadratic programming for cardinality constrained portfolio optimization [J].
Akbay, Mehmet Anil ;
Kalayci, Can B. ;
Polat, Olcay .
KNOWLEDGE-BASED SYSTEMS, 2020, 198
[5]   A survey of inverse reinforcement learning: Challenges, methods and progress [J].
Arora, Saurabh ;
Doshi, Prashant .
ARTIFICIAL INTELLIGENCE, 2021, 297 (297)
[6]   An enhanced whale optimization algorithm for large scale optimization problems [J].
Chakraborty, Sanjoy ;
Saha, Apu Kumar ;
Chakraborty, Ratul ;
Saha, Moumita .
KNOWLEDGE-BASED SYSTEMS, 2021, 233
[7]   Self-Adjusting Multitask Particle Swarm Optimization [J].
Han, Honggui ;
Bai, Xing ;
Han, Huayun ;
Hou, Ying ;
Qiao, Junfei .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (01) :145-158
[8]   Snake Optimizer: A novel meta-heuristic optimization algorithm [J].
Hashim, Fatma A. ;
Hussien, Abdelazim G. .
KNOWLEDGE-BASED SYSTEMS, 2022, 242
[9]   Driving Behavior Modeling Using Naturalistic Human Driving Data With Inverse Reinforcement Learning [J].
Huang, Zhiyu ;
Wu, Jingda ;
Lv, Chen .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) :10239-10251
[10]   Q-Learning-based parameter control in differential evolution for structural optimization [J].
Huynh, Thanh N. ;
Do, Dieu T. T. ;
Lee, Jaehong .
APPLIED SOFT COMPUTING, 2021, 107