Research on Air Confrontation Maneuver Decision-Making Method Based on Reinforcement Learning

被引:50
作者
Zhang, Xianbing [1 ]
Liu, Guoqing [1 ]
Yang, Chaojie [1 ]
Wu, Jiang [1 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China
来源
ELECTRONICS | 2018年 / 7卷 / 11期
关键词
over-the-horizon air confrontation; maneuver decision; Q-Network; heuristic exploration; reinforcement learning; COMBAT;
D O I
10.3390/electronics7110279
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of information technology, the degree of intelligence in air confrontation is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air confrontation, this paper constructs a super-horizon air confrontation training environment, which includes aircraft model modeling, air confrontation scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air confrontation maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air confrontation training environment. Through continuous interaction with the environment, self-learning of the air confrontation maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air confrontation maneuver strategy are verified by simulation experiments.
引用
收藏
页数:19
相关论文
共 18 条
[1]  
[Anonymous], 2015, Reinforcement Learning: An Introduction
[2]   GAME-THEORY FOR AUTOMATED MANEUVERING DURING AIR-TO-AIR COMBAT [J].
AUSTIN, F ;
CARBONE, G ;
FALCO, M ;
HINZ, H ;
LEWIS, M .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1990, 13 (06) :1143-1149
[3]  
[董小龙 Dong Xiaolong], 2005, [飞行力学, Flight Dynamics], V23, P90
[4]   On applied nonlinear and bilevel programming for pursuit-evasion games [J].
Ehtamo, H ;
Raivio, T .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2001, 108 (01) :65-96
[5]  
Ernest N., 2016, J DEF MANAG, V06, DOI 10.4172/2167-0374.1000144
[6]   Genetic Fuzzy Trees and their Application Towards Autonomous Training and Control of a Squadron of Unmanned Combat Aerial Vehicles [J].
Ernest, Nicholas ;
Cohen, Kelly ;
Kivelevitch, Elad ;
Schumacher, Corey ;
Casbeer, David .
UNMANNED SYSTEMS, 2015, 3 (03) :185-204
[7]  
Fu L, 2014, CHIN CONT DECIS CONF, P3380, DOI 10.1109/CCDC.2014.6852760
[8]  
Howard R.A., 1960, MATH GAZ, V3, P120
[9]  
Krishna Kumar K, 2002, P 10 MED C CONTR AUT, P1
[10]  
Krishnakumar K., 2007, P SPIE INT COMP THEO, V6560, P1