Surrogate-Assisted Evolutionary Q-Learning for Black-Box Dynamic Time-Linkage Optimization Problems

被引：8

作者：

Zhang, Tuo ^{[1
]}

Wang, Handing ^{[1
]}

Yuan, Bo ^{[2
]}

Jin, Yaochu ^{[3
,4
]}

Yao, Xin ^{[2
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence & Collaborat Innovat, Xian 710071, Peoples R China

[2] Southern Univ Sci & Technol, Guangdong Prov Key Lab Brain Inspired Intelligent, Shenzhen 518055, Peoples R China

[3] Bielefeld Univ, Fac Technol, Chair Nat Inspired Comp & Engn, D-33615 Bielefeld, Germany

[4] Univ Surrey, Dept Comp Sci, Guildford GU2 7XH, England

来源：

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION | 2023年 / 27卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Black-box problem; dynamic time-linkage optimization problem (DTP); evolutionary dynamic optimization (EDO); Q-learning; surrogate model; ALGORITHMS; MEMORY;

D O I：

10.1109/TEVC.2022.3179256

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dynamic time-linkage optimization problems (DTPs) are special dynamic optimization problems (DOPs) with the time-linkage property. The environment of DTPs changes not only over time but also depends on the previous applied solutions. DTPs are hardly solved by existing dynamic evolutionary algorithms because they ignore the time-linkage property. In fact, they can be viewed as multiple decision-making problems and solved by reinforcement learning (RL). However, only some discrete DTPs are solved by RL-based evolutionary optimization algorithms with the assumption of observable objective functions. In this work, we propose a dynamic evolutionary optimization algorithm using surrogate-assisted Q-learning for continuous black-box DTPs. To observe the states of black-box DTPs, the state extraction and prediction methods are applied after the search process at each time step. Based on the learned information, a surrogate-assisted Q-learning is introduced to evaluate and select candidate solutions in the continuous decision space in a long-term consideration. We evaluate the components of our proposed algorithm on various benchmark problems to study their behaviors. Results of comparative experiments indicate that the proposed algorithm outperforms other compared algorithms and performs robustly on DTPs with up to 30 decision variables and different dynamic changes.

引用

页码：1162 / 1176

页数：15

共 60 条

[1] Abbass HA, 2003, LECT NOTES COMPUT SC, V2632, P391
[2] [Anonymous], 1998, International Series in Operations Research & Management Science
[3] [Anonymous], 2002, P ANN C GEN EV COMP
[4] Atkeson CG, 1997, IEEE INT CONF ROBOT, P3557, DOI 10.1109/ROBOT.1997.606886
[5] Blackwell T., 2007, Evolutionary computation in dynamic and uncertain environments, P29, DOI [DOI 10.1007/978-3-540-49774-5_2, 10.1007/978-3-540-49774-5_2]
[6] Multiswarms, exclusion, and anti-convergence in dynamic environments
Blackwell, Tim
Branke, Juergen
[J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2006, 10 (04) : 459 - 472
[7] Bosman P.A.N., 2005, P 7 ANN WORKSH GEN E, P39
[8] Bosman PeterA.N., 2007, EVOLUTIONARY COMPUTA, P129, DOI DOI 10.1007/978-3-540-49774-5_6
[9] Branke J., 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), P1875, DOI 10.1109/CEC.1999.785502
[10] Solving online dynamic time-linkage problems under unreliable prediction
Bu, Chenyang
Luo, Wenjian
Zhu, Tao
Yue, Lihua
[J]. APPLIED SOFT COMPUTING, 2017, 56 : 702 - 716

← 1 2 3 4 5 6 →