Adaptive Noise-based Evolutionary Reinforcement Learning With Maximum Entropy

被引：0

作者：

Wang J.-Y. ^{[1
]}

Wang Z. ^{[1
]}

Li H.-X. ^{[1
]}

Chen C.-L. ^{[1
]}

机构：

[1] Department of Control Science and Intelligence Engineering, Nanjing University, Nanjing

来源：

Zidonghua Xuebao/Acta Automatica Sinica | 2023年 / 49卷 / 01期

基金：

中国国家自然科学基金;

关键词：

adaptive noise; Deep reinforcement learning; evolution strategies; evolutionary reinforcement learning; maximum entropy;

D O I：

10.16383/j.aas.c220103

中图分类号：

学科分类号：

摘要：

Recently, evolution strategies have been widely investigated in the field of deep reinforcement learning due to their promising properties of derivative-free optimization and high parallelization efficiency. However, traditional evolutionary reinforcement learning methods suffer from several problems, including the slow learning speed, the tendency toward local optima, and the poor robustness. A systematic method is proposed, named adaptive noise-based evolutionary reinforcement learning with maximum entropy, to tackle these problems. First, the canonical evolution strategies is introduced to enhance the influence of well-behaved individuals and weaken the impact of those with bad performance, thus improving the learning speed of evolutionary reinforcement learning. Second, a regularization term of maximizing the policy entropy is incorporated into the objective function, which ensures moderate stochastically of actions and encourages the exploration to new promising solutions. Third, the exploration noise is proposed to automatically adapt according to the current evolutionary situation, which reduces the dependence on prior knowledge and promotes the robustness of evolution. Experimental results show that this method achieves faster learning speed, better convergence to global optima, and improved robustness, compared to traditional approaches. © 2023 Science Press. All rights reserved.

引用

页码：54 / 66

页数：12

共 42 条

[1] Sutton R S, Barto A G., Reinforcement Learning: An Introduction, (2018)
[2] Li H, Zhang Q, Zhao D., Deep reinforcement learning-based automatic exploration for navigation in unknown environment, IEEE Transactions on Neural Networks and Learning Systems, 31, 6, pp. 2064-2076, (2019)
[3] Li D, Zhao D, Zhang Q, Chen Y., Reinforcement learning and deep learning based lateral control for autonomous driving, IEEE Computational Intelligence Magazine, 14, 2, (2019)
[4] Yang W, Shi Y, Gao Y, Yang M., Online multi-view subspace learning via group structure analysis for visual object tracking, Distributed and Parallel Databases, 36, 3, (2018)
[5] Luo B, Liu D, Huang T, Wang D., Model-free optimal tracking control via critic-only Q-learning, IEEE Transactions on Neural Networks and Learning Systems, 27, 10, pp. 2134-2144, (2016)
[6] Yao Hong-Ge, Zhang Wei, Yang Hao-Qi, Yu Jun, Joint regression object localization based on deep reinforcement learning, Acta Automatica Sinica, 41, pp. 1-10, (2020)
[7] Zhang Z, Zhao D, Gao J, Wang D., FMRQ: A multiagent reinforcement learning algorithm for fully cooperative tasks, IEEE Transactions on Cybernetics, 47, 6, pp. 1367-1379, (2016)
[8] Li Kai-Wen, Zhang Tao, Wang Rui, Qin Wei-Jian, He Hui-Hui, Huang Hong, Research reviews of combinatorial optimization methods based on deep reinforcement learning, Acta Automatica Sinica, 47, 11, pp. 2521-2537, (2021)
[9] Wang Yun-Peng, Guo Ge, Signal priority control for trams using deep reinforcement learning, Acta Automatica Sinica, 45, 12, pp. 2366-2377, (2019)
[10] Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng, A gait control method for biped robot on slope based on deep reinforcement learning, Acta Automatica Sinica, 47, 8, pp. 1976-1987, (2021)

← 1 2 3 4 5 →