Deep reinforcement learning framework and algorithms integrated with cognitive behavior models

被引：0

作者：

Chen H. ^{[1
]}

Li J.-X. ^{[1
]}

Huang J. ^{[1
]}

Wang C. ^{[1
]}

Liu Q. ^{[1
]}

Zhang Z.-J. ^{[1
]}

机构：

[1] College of Intelligence Science and Technology, National University of Defense Technology, Changsha

来源：

Kongzhi yu Juece/Control and Decision | 2023年 / 38卷 / 11期

关键词：

air combat maneuver; BDI; cognitive behavior mode; DQN; GOAL; PPO; reinforcement learning;

D O I：

10.13195/j.kzyjc.2022.0281

中图分类号：

学科分类号：

摘要：

When facing complex tasks with high-dimensional continuous state-space or sparse rewards, it is difficult for a reinforcement learning agent to learn an optimal policy from scratch. How to represent the known knowledge in a form understandable by human beings and the learning agent, and effectively accelerate policy convergence is still a difficult problem. Therefore, this paper proposes a deep reinforcement learning (DRL) framework integrating with cognitive behavior models. It represents prior knowledge as belief-desire-intention (BDI) based cognitive behavior models, which are used to guide policy learning in the DRL. Besides, we introduce the deep Q-learning algorithm with the cognitive behavior model (COG-DQN) and the proximal policy optimization algorithm with the cognitive behavior model (COG-PPO) based on the proposed framework. Moreover, we quantitatively design the guidance strategies of the cognitive behavior model to policy update. Finally, in a typical gym environment and an air combat maneuver confrontation environment, we verify that the proposed algorithms can efficiently use the cognitive behavior model to accelerate policy learning, and significantly alleviate the impact of high-dimensional state-space and sparse rewards. © 2023 Northeast University. All rights reserved.

引用

页码：3209 / 3218

页数：9

共 22 条

[1] Kakade S M., On the sample complexity of reinforcement learning, (2003)
[2] Sutton R S, Barto A G., Reinforcement learning: An introduction, IEEE Transactions on Neural Networks, 9, 5, (1998)
[3] Taylor M E, Stone P., Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, 10, 7, pp. 1633-1685, (2009)
[4] Da Silva F L, Costa A H R., A survey on transfer learning for multiagent reinforcement learning systems, Journal of Artificial Intelligence Research, 64, pp. 645-703, (2019)
[5] Yang T P, Hao J Y, Meng Z P, Et al., Towards efficient detection and optimal response against sophisticated opponents, Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 623-629, (2019)
[6] Chen H, Liu Q, Huang J, Et al., Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach, Applied Soft Computing, 121, (2022)
[7] Ammar H B, Eaton E, Taylor M E, Et al., An automated measure of MDP similarity for transfer in reinforcement learning
[8] Song J H, Gao Y, Wang H, Et al., Measuring the distance between finite Markov decision processes, Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pp. 468-476, (2016)
[9] Brys T, Harutyunyan A, Taylor M E, Et al., Policy transfer using reward shaping, Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, pp. 181-188, (2015)
[10] Bianchi R A C, Martins M F, Ribeiro C H C, Et al., Heuristically-accelerated multiagent reinforcement learning, IEEE Transactions on Cybernetics, 44, 2, pp. 252-265, (2014)

← 1 2 3 →