Research on Air Confrontation Maneuver Decision-Making Method Based on Reinforcement Learning

被引：50

作者：

Zhang, Xianbing ^{[1
]}

Liu, Guoqing ^{[1
]}

Yang, Chaojie ^{[1
]}

Wu, Jiang ^{[1
]}

机构：

[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China

来源：

ELECTRONICS | 2018年 / 7卷 / 11期

关键词：

over-the-horizon air confrontation; maneuver decision; Q-Network; heuristic exploration; reinforcement learning; COMBAT;

D O I：

10.3390/electronics7110279

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the development of information technology, the degree of intelligence in air confrontation is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air confrontation, this paper constructs a super-horizon air confrontation training environment, which includes aircraft model modeling, air confrontation scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air confrontation maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air confrontation training environment. Through continuous interaction with the environment, self-learning of the air confrontation maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air confrontation maneuver strategy are verified by simulation experiments.

引用

页数：19

共 18 条

[1]

[Anonymous], 2015, Reinforcement Learning: An Introduction

[2] GAME-THEORY FOR AUTOMATED MANEUVERING DURING AIR-TO-AIR COMBAT [J].

AUSTIN, F ;

CARBONE, G ;

FALCO, M ;

HINZ, H ;

LEWIS, M .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1990, 13 (06) :1143-1149

[3]

[董小龙 Dong Xiaolong], 2005, [飞行力学, Flight Dynamics], V23, P90

[4] On applied nonlinear and bilevel programming for pursuit-evasion games [J].

Ehtamo, H ;

Raivio, T .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2001, 108 (01) :65-96

[5]

Ernest N., 2016, J DEF MANAG, V06, DOI 10.4172/2167-0374.1000144

[6] Genetic Fuzzy Trees and their Application Towards Autonomous Training and Control of a Squadron of Unmanned Combat Aerial Vehicles [J].

Ernest, Nicholas ;

Cohen, Kelly ;

Kivelevitch, Elad ;

Schumacher, Corey ;

Casbeer, David .

UNMANNED SYSTEMS, 2015, 3 (03) :185-204

[7]

Fu L, 2014, CHIN CONT DECIS CONF, P3380, DOI 10.1109/CCDC.2014.6852760

[8]

Howard R.A., 1960, MATH GAZ, V3, P120

[9]

Krishna Kumar K, 2002, P 10 MED C CONTR AUT, P1

[10]

Krishnakumar K., 2007, P SPIE INT COMP THEO, V6560, P1

← 1 2 →