Hierarchical Reinforcement Learning for Air Combat at DARPA's AlphaDogfight Trials

被引：22

作者：

Pope A.P. ^{[1
,2
]}

Ide J.S. ^{[1
]}

Mićović D. ^{[1
]}

Diaz H. ^{[1
]}

Twedt J.C. ^{[1
]}

Alcedo K. ^{[1
]}

Walker T.T. ^{[1
]}

Rosenbluth D. ^{[1
]}

Ritholtz L. ^{[1
]}

Javorsek D. ^{[3
]}

机构：

[1] Applied AI Team, Lockheed Martin Artificial Intelligence Center, Shelton, 06484, CT

[2] Primordial Labs, New Haven, 06510, CT

[3] Nellis Air Force Base, United States Air Force, Las Vegas, 89191, NV

来源：

IEEE Transactions on Artificial Intelligence | 2023年 / 4卷 / 06期

关键词：

Air combat; artificial intelligence; autonomy; deep reinforcement learning; hierarchical reinforcement learning;

D O I：

10.1109/TAI.2022.3222143

中图分类号：

学科分类号：

摘要：

Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA's AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event. © 2020 IEEE.

引用

页码：1371 / 1385

页数：14

共 69 条

[1]

Isaacs R., Games of Pursuit, (1951)

[2]

Molineaux M., Klenk M., Aha D., Goal-driven autonomy in a navy strategy simulation, Proc. 24th AAAI Conf. Artif. Intell., pp. 182-195, (2010)

[3]

Burgin G.H., Sidor L., Rule-based Air Combat Simulation, (1988)

[4]

McGrew J.S., How J.P., Williams B., Roy N., Air-combat strategy using approximate dynamic programming, J. Guid., Control, Dyn., 33, 5, pp. 1641-1654, (2010)

[5]

Rodin E.Y., Amin S.M., Maneuver prediction in air combat via artificial neural networks, Comput. Math. with Appl., 24, 3, pp. 95-112, (1992)

[6]

Virtanen K., Karelahti J., Raivio T., Modeling air combat by amoving horizon influence diagram game, J. Guid., Control, Dyn., 29, 5, pp. 1080-1091, (2006)

[7]

Austin F., Carbone G., Falco M., Hinz H., Lewis M., Game theory for automated maneuvering during air-to-air combat, J. Guid., Control, Dyn., 13, 6, pp. 1143-1149, (1990)

[8]

McManus J., Goodrich K., Application of artificial intelligence (AI) programming techniques to tactical guidance for fighter aircraft, Proc. Guid., Navigation Control Conf., (1989)

[9]

Ernest N., Carroll D., Schumacher C., Clark M., Cohen K., Lee G., Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions, J. Defense Manage., 6, 1, pp. 2167-2174, (2016)

[10]

Zhang L.A., Et al., Air Dominance Through Machine Learning: A Preliminary Exploration of Artificial Intelligence: Assisted Mission Planning., (2020)

← 1 2 3 4 5 6 7 →